Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in Chicago, IL
Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
Lead incident response and reliability improvements for iManage Cloud. Triage large-scale production issues, build observability and automation, run postmortems, partner with product and engineering, and proactively detect and eliminate systemic problems to improve uptime and customer experience.
Top Skills:
AzureAzure Kubernetes Service (Aks)BashGrafanaKibanaPowershellPrometheusPythonRest ApisShellSplunkSQL
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Staff Site Reliability Engineer will lead reliability strategies, manage high-risk initiatives, and enhance engineering standards while ensuring system reliability and operational excellence within a hybrid work environment.
Top Skills:
BashCi/CdDatabase ArchitectureGoGoogle Cloud PlatformInfrastructure-As-CodeKubernetesMonitoring PlatformsPulumiPythonTerraform
Reposted 12 Hours AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
Join the SRE team to design, deploy, and operate resilient cloud infrastructure. Recommend solutions, automate workflows, configure Terraform and CI, implement monitoring and alerts, and support developers and users.
Top Skills:
AnsibleAurora MysqlAWSAzureBashChefCloudFormationComposerConcourseDataprocDockerGCPGoHipaaHitrustIsoKubernetesPackerPostgresPuppetPythonRubySaltSlackTerraform
Reposted 4 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Digital Media • Information Technology • News + Entertainment
Responsible for ensuring reliability, scalability, and performance of data platforms. Design monitoring and alerting, automate deployments and recovery, optimize storage and query performance, troubleshoot incidents, plan capacity and scaling, document operations, enforce security/compliance, and collaborate with data engineering, product, and data science teams to maintain high availability of large-scale data systems.
Top Skills:
AnsibleAWSAzureCi/CdDockerElk StackGCPGoGrafanaJavaKubernetesMySQLNoSQLPostgresPrometheusPythonScalaTerraform
Reposted 7 Days AgoSaved
Easy Apply
Easy Apply
AdTech
As a Site Reliability Engineer, you'll maintain the infrastructure for systems, ensure efficiency, automate processes, monitor databases, and participate in architecture discussions.
Top Skills:
Amazon KinesisAws LambdaAws SnsBigQueryDockerGcp (Google Cloud Platform)GitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerSQLTerraform
Appliances • Manufacturing
The Reliability Engineer identifies and resolves product failures, analyzes data, collaborates with teams to enhance product reliability, and supports new product development for markets in the Americas.
Top Skills:
Excel
Reposted 12 Hours AgoSaved
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.
Top Skills:
AWSBashGoKubernetesPythonSlurmTerraform
Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
The Senior Site Reliability Engineer will focus on automating infrastructure, enhancing cloud resilience, supporting deployments, and mentoring teams in reliability best practices, while participating in on-call rotations.
Top Skills:
AzureBashCi/CdDockerGoGrafanaJavaKubernetesPowershellPrometheusPythonRubyTerraform
Fintech • Payments • Financial Services
The Staff Reliability Engineer will enhance data platform reliability through automation, incident management, and observability in a hybrid work setting.
Top Skills:
AiopsAnsibleAWSCi/CdCloudFormationDatadogDynatraceEksEmrGCPGrafanaHadoopOpensearchPrometheusPythonSnowflakeSplunkTerraform
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills:
Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills:
Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Lead long-term strategy and architecture for cloud and on‑prem platform infrastructure, driving Kubernetes and multi‑cloud reliability, IaC/GitOps automation, observability, SLO/SLI/error‑budget practices, incident leadership, AI‑augmented tooling adoption, and mentorship of senior engineers to improve platform resilience and developer experience.
Top Skills:
Amazon Elastic Kubernetes Service (Eks)AutoscalingAWSCapacity PlanningCi/CdGitopsGoGoogle Cloud PlatformGoogle Kubernetes Engine (Gke)Identity And Access ManagementInfrastructure As CodeKubernetesLinuxNetworkingObservabilityOperatorsPulumiPythonRke2StorageTerraform
Cloud
The role involves designing, optimizing, and maintaining PostgreSQL and MySQL databases, ensuring high availability, reliability, and performance for mission-critical systems, while automating operational tasks and responding to incidents.
Top Skills:
AnsibleAWSDatadogGCPGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform
Aerospace • Logistics • Security • Software • Cybersecurity
Support IRCM product line by establishing, monitoring, and verifying subsystem reliability and maintainability requirements. Perform reliability predictions, FMECA, MTTR analysis, and FRACAS root cause/corrective action investigations. Ensure R&M program requirements are achieved and drive related projects and processes. On-site in Rolling Meadows, IL.
Top Skills:
FmecaFracasIrcmMttrReliability And Maintainability (R&M)Systems Engineering
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Design, build, and launch reliability engineering projects to scale production services. Improve scalability, observability, secrets/configuration management, canary-based deployments, and deployment safety. Partner with teams, support critical services, participate in on-call rotations, and promote reliability best practices.
Top Skills:
AWSAzureDatadogGCPGenerative AiGoKibanaRubyTerraform
Renewable Energy
Own reliability, performance, and scalability of Postgres and ClickHouse databases. Build scalable data pipelines, design analytical schemas and DBT models, migrate data to ClickHouse, implement data quality checks, eliminate duplicates, and manage database infrastructure via IaC.
Top Skills:
Aws CdkAws Step FunctionsCi/CdClickhouseDagsterDbtPostgresPulumiPythonSQL
Software
As a Senior DevOps / Platform Reliability Engineer, you will manage CI/CD pipelines, automate infrastructure, operate Kubernetes, and enhance observability while ensuring security and compliance for enterprise systems.
Top Skills:
Argo CdAurora MysqlAWSBashCloudFormationEksElasticacheGithub ActionsGrafanaKubernetesLinuxMskOpentelemetryPrometheusPythonS3Terraform
Artificial Intelligence • Insurance • Software • Automation
Lead design, automation, and optimization of database infrastructure (PostgreSQL/Aurora). Build monitoring, tuning, and scaling strategies, create automation tooling, drive performance and reliability initiatives, and expand into broader SRE responsibilities to improve availability and system health for a growing SaaS platform.
Top Skills:
Amazon AuroraCi/CdDockerJavaScriptKubernetesNode.jsPostgresPrismaRedshiftTerraformTerragruntTypescript
Automotive • Information Technology • Other • Transportation • Energy
Perform RAM and FMECA/FMEA analyses, develop fault trees and reliability predictions, support maintainability and logistics analyses, produce reliability growth test plans, contribute to systems engineering documentation, advise design engineers on R&M shortfalls, and present results to management and clients.
Top Skills:
Fault Tree AnalysisFmeaFmecaIntegrated Logistics Support (Ils/Ilsa)Iso-9000Mil-Hdbk-217FRam ModellingRam SoftwareStatistical Methods
Reposted 9 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.
Top Skills:
AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls
Big Data • Cloud • Software • Database
Seeking a Site Reliability Engineer with expertise in networking and distributed systems for building secure multi-cloud infrastructure. Responsibilities include maintaining network architecture and ensuring reliable service-to-service communication, involving a 24/7 on-call rotation.
Top Skills:
AWSAzureBgpDnsGCPIpv6KubernetesLoad BalancingMtlsService MeshTcp/IpTlsVpcsVpns
Big Data • Fintech • Mobile • Payments • Financial Services
Design and build a centralized reliability platform and developer-facing APIs. Implement AI agents for incident triage, log/trace summarization, and recommended actions. Own projects end-to-end and collaborate with product, infra, data, and SRE teams.
Top Skills:
Ai FrameworksAPIsClaudeCopilotCursorLlmsPython
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Chicago, IL Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs in Chicago
.NET Developer Jobs in Chicago
Android Developer Jobs in Chicago
Application Engineer Jobs in Chicago
Artificial Intelligence Engineer Jobs in Chicago
Backend Engineer Jobs in Chicago
C# Jobs in Chicago
C++ Jobs in Chicago
Devops Engineer Jobs in Chicago
DevOps Jobs in Chicago
Director Of Software Engineering Jobs in Chicago
Electrical Engineering Jobs in Chicago
Engineering Jobs in Chicago
Engineering Manager Jobs in Chicago
Enterprise Architect Jobs in Chicago
Fpga Engineer Jobs in Chicago
Front-End Developer Jobs in Chicago
Full-Stack Engineer Jobs in Chicago
Golang Jobs in Chicago
Hardware Engineer Jobs in Chicago
Infrastructure Engineer Jobs in Chicago
iOS Developer Jobs in Chicago
Java Developer Jobs in Chicago
Java Full-Stack Engineer Jobs in Chicago
Javascript Jobs in Chicago
Lead Software Engineer Jobs in Chicago
Linux Jobs in Chicago
Perl Jobs in Chicago
PHP Developer Jobs in Chicago
Platform Engineer Jobs in Chicago
Principal Engineer Jobs in Chicago
Principal Software Engineer Jobs in Chicago
Project Engineer Jobs in Chicago
Python Jobs in Chicago
QA Engineer Jobs in Chicago
Reliability Engineer Jobs in Chicago
Ruby Jobs in Chicago
Sales Engineer Jobs in Chicago
Salesforce Developer Jobs in Chicago
Scala Jobs in Chicago
Senior Android Engineer Jobs in Chicago
Senior Devops Engineer Jobs in Chicago
Senior Engineer Jobs in Chicago
Senior Front-End Engineer Jobs in Chicago
Senior Full-Stack Engineer Jobs in Chicago
Senior Java Engineer Jobs in Chicago
Senior Network Engineer Jobs in Chicago
Senior Platform Engineer Jobs in Chicago
Senior Site Reliability Engineer Jobs in Chicago
Senior Software Architect Jobs in Chicago
Senior Solutions Architect Jobs in Chicago
Senior Systems Engineer Jobs in Chicago
Software Engineering Manager Jobs in Chicago
Software Test Engineer Jobs in Chicago
Solutions Architect Jobs in Chicago
Solutions Engineer Jobs in Chicago
Staff Engineer Jobs in Chicago
Staff Software Engineer Jobs in Chicago
Systems Engineer Jobs in Chicago
Web Developer Jobs in Chicago
All Filters
Total selected ()
No Results
No Results













.png)
















