Get the job you really want.
Maximum of 25 job preferences reached.
Top Remote Reliability Engineer Jobs in Chicago, IL
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills:
Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics
The Staff Site Reliability Engineer will architect, operate, and improve the platform while ensuring security compliance and enhancing development processes.
Top Skills:
AWSElasticsearchIstioKubernetesNatsNode.jsPostgresPythonReactTerraformTypescript
Reposted 19 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Site Reliability Engineer will build and scale identity management tools, automate operations, ensure security, and support AWS, GCP, and Azure environments.
Top Skills:
AnsibleAWSAzureC#Cloud Identity ProvidersDockerGCPGoInfrastructure As CodeJavaKubernetesPythonRubyTerraform
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Marketing Tech
The Cloud Reliability Engineer develops, configures, and deploys cloud tools, enhances applications, ensures observability, and participates in on-call rotations.
Top Skills:
AWSCi/CdDockerGithub ActionsGoGoogle BigqueryGCPKubernetesLinuxPythonSQLTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves improving software reliability, automating processes, collaborating with teams on system optimization, and mentoring engineers to establish reliability as a core value.
Top Skills:
AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
Logistics • Software • Transportation
Design and maintain infrastructure and software architecture, focusing on automation, observability, security, and developer productivity. Troubleshoot issues and optimize databases.
Top Skills:
GoInfrastructure As CodeJavaScriptKubernetesLinuxPythonShell Script
Database • Analytics
Drive reliability, availability, scalability, and performance of ClickHouse Core. Build alerts, run incident response and blameless postmortems, debug production issues, submit fixes, lead chaos engineering and on-call/escalation processes.
Top Skills:
Clickhouse,Clickhouse Cloud,Sql,Shell,Python,C++,Aws,Azure,Google Cloud Platform
Database
Manage and optimize Postgres databases at scale on AWS RDS, own reliability/monitoring, execute low-downtime upgrades and migrations, troubleshoot production issues, participate in on-call rotation, and collaborate with platform and product teams.
Top Skills:
Aws RdsBarmanGoPgbackrestPostgresTypescriptWal-G
Legal Tech • Software
Lead automation and optimization of Filevine's data platform: performance tune MSSQL/Postgres, optimize Snowflake, provision infrastructure with Terraform/AWS, run stateful containers on Kubernetes, integrate AI/LLM and MCP for operational automation, manage CI/CD, capacity planning, documentation, and serve in 24/7 on-call rotation.
Top Skills:
Microsoft Sql Server (Mssql),Postgresql,Snowflake,Terraform,Aws,Docker,Kubernetes,Gitlab,Octopus Deploy,Python,Powershell,C#,Entity Framework,Dapper,Dynamodb,Opensearch,Redis,Mcp (Model Context Protocol),Llms
Insurance
The role entails managing and operating the Cyber Recovery Environment, ensuring resilience against cyber attacks through system design, implementation, and maintenance of storage infrastructure.
Top Skills:
Amazon S3AutomationAWSAzureCyber Recovery InfrastructureFibre ChannelGCPHitachi SanIaasIscsiMongoDBNasNetapp NasNfsS3SanSmb/CifsStorage Solutions
Logistics • Software • Transportation
Lead and mentor teams in DevOps and SRE, architect scalable Azure Cloud infrastructure, implement CI/CD and IaC, ensure database reliability, and drive cross-functional collaboration.
Top Skills:
Azure CloudAzure DevopsCi/CdCosmosdbDockerElkGrafanaKubernetesMySQLPostgresPrometheusRedisSQL ServerTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Healthtech • Software
Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.
Top Skills:
Amazon RdsAzure Sql DatabaseBashEcs FargateFlywayGitlabJenkinsKubernetesLiquibaseOctopus DeployOraclePostgresPowershellPythonRedisSolarwinds DpaSQL Server
Insurance
Lead reliability strategy and architecture for critical systems, drive incident management and root-cause analysis, build automation and SRE tooling, influence release/change practices and compliance, and mentor junior engineers to improve operational reliability.
Top Skills:
AngularAWSCi/CdCloudFormationContainerizationJavaJavaScriptLogsNettyNext.JsNode.jsNon-Relational DatabasesObservability (MetricsOrchestrationOrmReactRelational DatabasesServicenowSpringSpring BootTomcatTracing)
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Big Data • Cloud • Information Technology
The Site Reliability Engineer at Iron Mountain will troubleshoot escalated tickets, manage Windows Server builds, perform security patching, and collaborate with customers and vendors to resolve issues and maintain systems.
Top Skills:
CloudComputeHyper-Converged InfrastructureLinuxMicrosoft Endpoint Configuration ManagerNetworkNutanixPowershellRubrikStorageVirtualizationWindows Server
Artificial Intelligence • eCommerce • Retail
Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.
Top Skills:
AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform
Aerospace • Big Data • Greentech • Hardware • Social Impact
Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.
Top Skills:
AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills:
AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Real Estate • Financial Services • PropTech
As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.
Top Skills:
ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform
Cloud • Software
In this role, you'll support large-scale applications, improve observability, mentor team members, and ensure reliability by collaborating on deployments and writing automation scripts while providing 24/7 support.
Top Skills:
AnsibleAWSBashConfluenceDockerElk StackGCPGitlab CicdGrafanaJenkinsJIRAKubernetesLinuxMongoDBMySQLNagiosOciPerlPostgresPrometheusPuppetPythonTerraform
Cloud • Security • Software
Design, build, and maintain cloud-hosted infrastructure and CI/CD pipelines for a large identity platform. Improve deployment automation, reliability, observability, and cost optimization. Collaborate across teams, evaluate technologies, participate in planning, and join an on-call rotation to support production services.
Top Skills:
Ci/CdDockerGCPGitGoKubernetes
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills:
AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Software
Own reliability, performance, and scalability of PostgreSQL infrastructure. Implement HA, replication, observability, capacity planning, automation, and DR. Support engineering teams with migrations, query optimization, on-call incident response, runbooks, and tooling to enable safe DB operations.
Top Skills:
AnsibleAuroraAws RdsChefDatadogDynamoDBElasticacheGoGrafanaIndexingMvccPatroniPgbouncerPostgresPrometheusPythonQuery PlannerReplicationRubySQLTerraformVacuum TuningWal
12 Days AgoSaved
Travel
The Senior Site Reliability Engineer will automate and optimize infrastructure on Google Cloud, improve cost efficiency, and support on-call incidents, working closely with the engineering teams.
Top Skills:
BashContainersDatadogGCPHelmIstioKubernetesKustomizePythonSQL
Top Chicago, IL Companies Hiring Remote Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results

.png)
.png)





























