Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Remote Senior Site Reliability Engineer Jobs in Chicago, IL

Cooley

Senior Technology Site Reliability Engineer

Reposted 14 Days AgoSaved

In-Office or Remote

2 Locations

140K-205K Annually

Senior level

140K-205K Annually

Senior level

Information Technology • Legal Tech

The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.

Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform

Nebius

Site Reliability Engineer

Reposted 5 Days AgoSaved

Remote

United States

100K-140K Annually

Mid level

100K-140K Annually

Mid level

Artificial Intelligence • Information Technology • Consulting

The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.

Top Skills: DhcpDnsLinuxNtpPython

Strike (simplistic.com)

Site Reliability Engineer

Reposted 5 Days AgoSaved

Remote

USA

Senior level

Information Technology • Cryptocurrency

The Site Reliability Engineer will lead technical initiatives, architect solutions, troubleshoot issues, mentor team members, and improve observability practices.

Top Skills: ArgocdBashElk StackGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform

Kraken Digital Asset Exchange

Site Reliability Engineer - AI Agents

6 Days AgoSaved

Remote

United States

96K-192K Annually

Senior level

96K-192K Annually

Senior level

Blockchain • Financial Services • Cryptocurrency • Web3

Design, build, and operate scalable, observable infrastructure for AI agent workflows. Build platform services, APIs, and SDKs; manage cloud, Kubernetes, and model-serving compute; implement IaC, CI/CD, monitoring, incident response, security controls, and runbooks; collaborate with AI and data teams to productionize agent prototypes.

Top Skills: AWSBashCi/CdDockerKubernetesPythonTerraform

PTC

Principal Software Engineer-SRE

Reposted 6 Days AgoSaved

Remote

USA

113K-175K Annually

Senior level

113K-175K Annually

Senior level

Information Technology • Internet of Things • Software • Virtual Reality

Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.

Top Skills: AWSCi/CdJavaMongoDBRabbitMQZookeeper

Kong

Staff Site Reliability Engineer — Project Volcano

Reposted 7 Days AgoSaved

Remote

United States

140K-197K Annually

Expert/Leader

140K-197K Annually

Expert/Leader

Artificial Intelligence • Cloud • Information Technology • Software • Big Data Analytics

As Staff SRE for Project Volcano, you'll own reliability, architect infrastructure, scale data services, and set SRE practices while mentoring teams.

Top Skills: ArgocdDatadogGrafanaHelmKubernetesPostgresPrometheusRedisTerraformTerragrunt

Veeam

GOV Site Reliability Engineer

7 Days AgoSaved

Remote

United States

152K-253K Annually

Mid level

152K-253K Annually

Mid level

Cloud • Security • Software • Cybersecurity

Join the GOV/Sovereign Cloud SRE team to maintain and improve reliability for the Veeam Data Cloud. Responsibilities include incident response, SLIs/SLOs, observability (monitoring, alerting, dashboards), runbooks and documentation, IaC and CI/CD work in compliance-restricted environments, and participation in on-call rotations. Collaborate with engineering, security, and compliance teams to implement high availability and automation.

Top Skills: ArgocdAzureAzure DevopsAzure GovernmentC#Elk StackGithub ActionsGitlab CiGoGrafanaJavaJavaScriptKubernetesOpentelemetryPrometheusPulumiTerraformTerragruntTypescript

HiBob

Senior Site Reliability Engineer - Remote EST

Reposted 12 Days AgoSaved

Remote or Hybrid

United States

190K-235K Annually

Senior level

190K-235K Annually

Senior level

HR Tech • Information Technology • Professional Services • Sales • Software

Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.

Top Skills: Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython

Cority

Site Reliability Engineer II

Reposted 7 Days AgoSaved

Remote

United States

Mid level

Healthtech • Software

Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.

Top Skills: Amazon RdsAzure Sql DatabaseBashEcs FargateFlywayGitlabJenkinsKubernetesLiquibaseOctopus DeployOraclePostgresPowershellPythonRedisSolarwinds DpaSQL Server

WEX Inc.

Senior Staff Site Reliability Engineer

Reposted 16 Days AgoSaved

In-Office or Remote

2 Locations

160K-179K Annually

Senior level

160K-179K Annually

Senior level

Fintech • Payments

The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.

Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk

Yahoo

Software Engineer , SRE Tooling & Reliability Platforms

Reposted 7 Days AgoSaved

Remote

United States of America

89K-184K Annually

Entry level

89K-184K Annually

Entry level

AdTech • Digital Media • Information Technology • Other

As a Software Engineer in the Tooling and Reliability Platforms team, you'll develop AI services, manage incident tools, and utilize Infrastructure as Code for high-availability systems. You'll focus on integrating AI workflows and improving operational resilience for Yahoo's brands.

Top Skills: AWSCloudFormationDockerGCPGoJavaKubernetesPythonTerraform

TERN Group

Site Reliability Engineer

8 Days AgoSaved

Remote

United States

175K-200K Annually

Senior level

175K-200K Annually

Senior level

Artificial Intelligence • Healthtech • HR Tech • Software

Own the Heroku-to-GCP migration, maintain Postgres and data pipelines, optimize high‑traffic code paths, build monitoring/alerting, lead incident response and post‑mortems, reduce costs and scale proactively, and coach other infrastructure engineers.

Top Skills: AppsignalBigQueryBugsnagCannyClaude CodeFivetranGoogle Cloud PlatformHerokuHexHotwireInfrastructure-As-CodePostgresRuby On Rails

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

athenahealth

Lead SRE- Observability

8 Days AgoSaved

Remote

USA

143K-243K Annually

Senior level

143K-243K Annually

Senior level

Healthtech • Information Technology • Telehealth

Lead the design, build, and operation of scalable observability and telemetry platforms. Implement IaC and automation, support monitoring/alerting, troubleshoot production distributed systems, participate in incident response/on-call, and mentor engineers while driving platform reliability and cross-team technical decisions.

Top Skills: AWSBashCi/CdClickhouseCloudFormationDockerElasticsearchFluentdGoGrafanaKafkaKubernetesLinuxOpensearchOpentelemetryPrometheusPythonTcpdumpTerraformVectorWireshark

Airbnb

Engineering Manager, Storage SRE

Reposted 8 Days AgoSaved

Remote

United States

212K-265K Annually

Expert/Leader

212K-265K Annually

Expert/Leader

Real Estate • Travel • PropTech

The Engineering Manager for Storage SRE will lead a team to ensure reliable database operations, improve developer experience, and expand tooling and operational models, focusing on mission-critical systems.

Top Skills: Cloud InfrastructureDatabasesSite Reliability EngineeringStorage Systems

Offchain Labs

Site Reliability Engineer

9 Days AgoSaved

Remote

United States

Mid level

Blockchain • Software

Build, operate, and scale production Kubernetes infrastructure using GitOps and declarative IaC. Design CI/CD workflows, observability, and secure-by-default systems. Troubleshoot networking/storage, participate in on-call rotations, automate operational workflows, and drive postmortems and reliability improvements.

Top Skills: ArbitrumArgocdArgocd ApplicationsetsAWSAzureBashCloudwatchCodebuildGCPGithub ActionsGitopsGoGrafanaK9SKubernetesLinuxLokiMimirPrometheusPrysmPythonTerraformYamlZerodev

Alpaca

Staff Site Reliability Engineer, Database

Reposted 9 Days AgoSaved

Remote

USA

Senior level

Fintech • Information Technology

As a Site Reliability Engineer at Alpaca, you will ensure system reliability and performance, troubleshoot issues, and collaborate with teams to design scalable features.

Top Skills: GoGormLinuxPgxPostgresPrometheusSqlc

Oscilar

Sr./Staff - Infrastructure/Site Reliability Engineer (SRE)

Reposted 9 Days AgoSaved

Remote

USA

Senior level

Artificial Intelligence • Fintech • Software • Financial Services

The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.

Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform

Aalyria

Site Reliability Engineer

Reposted 10 Days AgoSaved

Remote

United States

115K-135K Annually

Mid level

115K-135K Annually

Mid level

Aerospace • Manufacturing

As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.

Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform

Tekmetric

Site Reliability Engineer

Reposted 10 Days AgoSaved

Remote

United States

Senior level

Automotive

Design and implement scalable cloud infrastructure, monitor performance, automate processes, ensure security and compliance, and lead a DevOps team.

Top Skills: AWSBashCi/CdDockerElk StackGCPGrafanaKubernetesPrometheusPythonTerraform

Canonical

Site Reliability Engineer

Reposted 11 Days AgoSaved

In-Office or Remote

United States

200K-200K Annually

Mid level

200K-200K Annually

Mid level

Cloud • Software

The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.

Top Skills: KubernetesLinuxOpenstackPython

OneStream Software

Site Reliability Engineer

12 Days AgoSaved

Remote

USA

114K-148K Annually

Senior level

114K-148K Annually

Senior level

Software • Financial Services

Ensure platform reliability, performance, and availability by implementing observability, automating infrastructure, participating in on-call rotations and post-mortems, partnering with Product and Engineering, designing scalable architectures, mentoring teammates, and integrating Dynatrace with Azure DevOps and Jira while supporting compliance (SOC/FedRAMP).

Top Skills: .NetAksAlpineAnsibleAppinsightsArm TemplatesAWSAzure DevopsBashBicepC#ChefCloudFormationDatadogDebianDynatraceEksGCPGitGitGksGrafanaHelmJIRAKubernetesLog AnalyticsAzureNew RelicOnestream SoftwareOpenshiftPowershellPowershell DscPrometheusPuppetPythonRest ApisSQLTerraformUbuntu

Elastic

Site Reliability Engineer (Hosted Infra) - Platform

12 Days AgoSaved

Remote

United States

143K-175K Annually

Mid level

143K-175K Annually

Mid level

Cloud • Security • Software • Generative AI

Design, build, and automate large-scale multi-cloud infrastructure and internal SRE tools. Improve host lifecycle, observability, alerting, and reliability; operate containerized workloads; participate in on-call rotations, incident response, runbooks, postmortems, code reviews, and mentoring.

Top Skills: AnsibleArgo CdArgo WorkflowsCueDockerElastic StackGoGraphiteInfluxKubernetesLinuxPrometheusPuppetTerraformUbuntuUbuntu Live Patch

Yugabyte

Staff Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

United States

220K-250K Annually

Expert/Leader

220K-250K Annually

Expert/Leader

Cloud • Software • Database

Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.

Top Skills: AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform

Akamai Technologies

Site Reliability Engineer II - Database

14 Days AgoSaved

In-Office or Remote

United States

95K-171K Annually

Junior

95K-171K Annually

Junior

Cloud • Security • Software • Cybersecurity

The Site Reliability Engineer II - Database ensures the integrity, security, and performance of MySQL databases while collaborating with development and operations teams to address database issues and improve reliability.

Top Skills: MySQLSQL

Elastic

Principal SRE (Networking) - Platform Control Plane

14 Days AgoSaved

Remote

United States

180K-233K Annually

Expert/Leader

180K-233K Annually

Expert/Leader

Cloud • Security • Software • Generative AI

The role involves designing, building, and automating network infrastructure for Elastic's global services, focusing on reliability and operational excellence while enhancing customer experience through proactive problem management.

Top Skills: AnsibleBgpDnsDockerElastic StackGoKubernetesTerraform