Maximum of 25 job preferences reached.
Top Remote Senior Site Reliability Engineer Jobs in Chicago, IL
Healthtech • Information Technology • Telehealth
Lead Site Reliability Engineer responsible for ensuring cloud services reliability, automation, and performance while mentoring a team and collaborating cross-functionally. Drive initiatives to enhance incident management and enforce security compliance.
Top Skills:
AnsibleAWSAws CloudformationAzureBashDatadogDockerElk StackGoGCPGrafanaKubernetesPrometheusPuppetPythonTerraform
Big Data • Cloud • Software • Analytics
As a Site Reliability Engineering Intern, you'll monitor cloud services, assist in incident management, support automation, and collaborate with engineers to improve system reliability.
Top Skills:
AWSAzureDatabasesGCPGitGrafanaKafkaLinux/Unix SystemsPrometheusPythonShell ScriptingTerraform
Sports
Manage and improve the AWS infrastructure, deploy into new regions, monitor releases, and implement new technologies in a fast-paced environment.
Top Skills:
AWSDockerGrafanaKubernetesPrometheusPython
Cloud • Information Technology
As a Staff Platform Engineer, you'll develop and maintain infrastructure components using Go and Node.js, improve service reliability, mentor juniors, and manage data ecosystems.
Top Skills:
EnvoyExpressGoJenkinsKafkaMySQLNode.jsPostgresPuppetPythonReactRedis
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.
Top Skills:
AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills:
AnsibleBashGCPGkeGoKubernetesPulumiPython
Security • Software • Cybersecurity
The NetOps SRE ensures network infrastructure reliability, handles troubleshooting, configures routing protocols, and collaborates with teams and customers on issues.
Top Skills:
AnsibleAristaBgpDdosDnsGitJuniperLan/WanMplsPythonUnix
Healthtech • Social Impact • Transportation • Telehealth
The Site Reliability Engineer IV will enhance system reliability and performance, maintain infrastructure, troubleshoot incidents, develop automation tools, and provide on-call support.
Top Skills:
.NetAzureCi/CdGitIisJavaScriptJenkinsMicrosoft Development StackPulumiPythonShellSQL ServerTerraform
Cloud • Security • Software • Cybersecurity
As a Site Reliability Engineer II, you'll automate tasks, monitor AI workloads, enhance dashboards, support CI/CD processes, and collaborate with engineering teams on complex issues while participating in on-call rotations.
Top Skills:
GoGrafanaKubernetesLinuxPrometheusPythonSaltstackTerraform
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
Lead architecture and build reliability platforms, drive AIOps automation, champion SRE practices, lead incident response and postmortems, advance observability, and mentor engineers to improve system reliability and performance.
Top Skills:
AiopsAWSAzureContinuous ProfilingDatadogDnsElkGCPGoGrafanaHttp/SKubernetesLoad BalancingOpentelemetryPrometheusPythonTcp/Ip
Cloud • Security • Software • Generative AI
The role involves designing and developing tooling for the Elastic Stack, managing production services, and supporting internal Elastic Stack usage for development and analytics.
Top Skills:
AnsibleChefClojureDockerHaskellJavaScriptKubernetesPackerPuppetPythonSaltTerraform
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills:
AWSDockerGCPKubernetes
Information Technology • Other • Software • Consulting
The Site Reliability Engineer at CardioOne will enhance the reliability and performance of production systems, implement automation, and lead incident response efforts while collaborating with development teams.
Top Skills:
AnsibleAWSAzureDatadogDockerEcsJavaKubernetesPythonTerraformTerragrunt
Fitness
The Staff Site Reliability Engineer will establish SRE best practices, drive observability strategy, implement software solutions, and mentor engineers. Responsibilities include improving platform resilience, managing risks, and participating in incident response processes.
Top Skills:
AnsibleAWSAzureBashCloudFormationGCPGoKubernetesPulumiPythonTerraform
Information Technology • Legal Tech
The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.
Top Skills:
AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform
HR Tech • Software
The Site Reliability Engineer will architect and manage AWS infrastructure, implement CI/CD pipelines, lead incident responses, and mentor junior engineers to maintain reliability and security for a B2B SaaS platform.
Top Skills:
AlbAWSBashCloudFormationDatadogEcsGitGoGuarddutyJenkinsLambdaPythonS3Terraform
Information Technology • Cryptocurrency
The Site Reliability Engineer will lead technical initiatives, architect solutions, troubleshoot issues, mentor team members, and improve observability practices.
Top Skills:
ArgocdBashElk StackGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform
Artificial Intelligence • Information Technology • Consulting
The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.
Top Skills:
DhcpDnsLinuxNtpPython
Hardware • Machine Learning • Security • Software
The Staff Device SRE will build and maintain a scalable test platform, ensuring reliability and integrating with various teams for software testing and hardware control, with a focus on advanced infrastructure practices.
Top Skills:
AnsibleAWSBashC/C++GrafanaGroovyInfrastructure-As-CodeJavaJavaScriptPrometheusPythonRSreTerraformTest Automation
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills:
Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
Information Technology • Internet of Things • Software • Virtual Reality
Lead reliability, availability, and resiliency strategies for large-scale systems, drive operational excellence, and provide technical mentorship across engineering teams.
Top Skills:
AWSCi/CdJavaMongoDBRabbitMQZookeeper
Healthtech • Other • Software
The Database Site Reliability Engineer will ensure the reliability and performance of PostgreSQL services, manage incidents, and automate tasks while collaborating with cross-functional teams in a 24x7 SaaS environment.
Top Skills:
AnsibleBashDatadogETLGrafanaPostgresPowershellPrometheusPythonTerraform
Fintech • Software
The Senior Site Reliability Engineer ensures fast, stable SaaS products through automation, collaboration, monitoring, and implementing AI tools to enhance performance and reliability.
Top Skills:
Ai ToolsAnsibleAppdynamicsAWSAzureAzure DevopsBashC# .NetCosmosDatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicPowershellPythonSaaSSQLTerraform
Security • Software • Analytics
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Top Skills:
Amazon EksAWSCircleCIDockerGithub ActionsGitlabGoKubernetesLinuxLlmsMonitoring And Observability ToolsPulumiTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Chicago, IL Companies Hiring Remote Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results
.png)



.png)




























