Get the job you really want.
Maximum of 25 job preferences reached.
Top Remote Reliability Engineer Jobs in Chicago, IL
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer (SRE) at NVIDIA is responsible for designing, building, and maintaining large-scale production systems, focusing on reliability and efficiency, automation, and continuous improvement.
Top Skills:
ContainersGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby
Fintech • Information Technology
As a Site Reliability Engineer at Alpaca, you'll ensure system reliability and performance while collaborating with development teams, managing incidents, and improving observability. Requires strong troubleshooting and operational skills, particularly with PostgreSQL.
Top Skills:
GoLinuxPostgresPrometheus
Software • Energy
As a Lead Site Reliability Engineer, you'll manage the Product Reliability team, ensuring product performance, scalability, and availability while delivering technical improvements and mentoring team members.
Top Skills:
AWSDockerKubernetesLinuxPostgresPythonRabbitMQTerraform
Other • Real Estate • PropTech
As a Senior Site Reliability Engineer, you will design and manage scalable infrastructure, automate processes, collaborate with teams, and ensure system reliability.
Top Skills:
GoInfrastructure As CodeJavaPython
Artificial Intelligence • Cybersecurity
The Database Reliability Engineer will ensure database availability, performance, scalability, and security across AWS, collaborating with application and security teams.
Top Skills:
AWSCrossplaneDatadogGitlab Ci/CdKubernetesNoSQLOpensearchPostgresTerraform
Reposted YesterdaySaved
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer will design, implement, and maintain an observability platform, ensuring reliability and performance while supporting production systems and optimizing operational practices.
Top Skills:
DockerGoGrafanaKubernetesLinuxNetworkingOpenstackOpentelemetryPerlPrometheusPythonRuby
Cloud • Fintech • Information Technology • Software • Business Intelligence
As a Site Reliability Engineer, you will ensure production system reliability, optimize performance, respond to incidents, and collaborate on infrastructure improvements.
Top Skills:
AnsibleAWSBashDatadogDockerElkGitGrafanaKubernetesNew RelicOpentelemetryPrometheusPythonReactRubyRuby On RailsTerraform
Cloud • Software
The Senior Site Reliability / Gitops Engineer will drive automation and collaboration within the IS team, enhancing Canonical's IT operations and services while managing infrastructure as code and cloud technologies.
Top Skills:
Cloud ComputingDockerElasticsearchGitopsGrafanaIacKubernetesLinuxPrometheusPython
Cloud • Software
As a Site Reliability / Gitops Engineer, you will automate operations, develop Infrastructure as Code, maintain core services, and collaborate on service architecture.
Top Skills:
Ci/CdCloud ComputingElasticsearchGrafanaInfrastructure As CodeLinuxPrometheusPython
Cloud • Software
The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.
Top Skills:
KubernetesLinuxOpenstackPython
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills:
KubernetesLinuxOpenstackPython
Healthtech • Social Impact
As a Senior Site Reliability Engineer at Virta Health, you'll build automation and tooling for reliability, enhance observability, and mentor engineering teams in best practices.
Top Skills:
AIAiopsGoMlPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Information Technology • Software • Web3
As a Software Engineer focused on SRE and DevSecOps, you will design scalable infrastructure, implement CI/CD pipelines, and automate processes while collaborating with teams to enhance performance and security.
Top Skills:
AnsibleBashDatadogDockerGCPGrafanaKubernetesPythonReactRustSolidityTerraformWeb3
Cloud • Security • Software
The Site Reliability Engineer will design, automate and scale cloud infrastructure while ensuring uptime, performance, and security best practices.
Top Skills:
AnsibleAWSAzureChefDockerGCPGoJavaScriptKubernetesLinuxPuppetPythonRubySaltstackTerraform
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Design and maintain large scale Kubernetes clusters, ensuring reliability through monitoring, automation and incident response.
Top Skills:
DockerGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Principal Staff SRE will lead initiatives in building and optimizing core infrastructure services on-prem and cloud, deploying and managing services at scale, and improving performance with automation and monitoring tools.
Top Skills:
DhcpDnsEbpfGoLdapLinuxNtpPythonTerraformXdp
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer will manage deployments, operations, and incident handling for large-scale AI GPU platforms while ensuring high performance and resilience in configurations.
Top Skills:
C++KubernetesLinuxPython
Hardware • Machine Learning • Security • Software
The Site Reliability Engineer will manage software deployment for IoT devices, improve observability, maintain dashboards, automate processes, and collaborate on incident responses.
Top Skills:
AnsibleAWSBashC/C++DatadogGrafanaGroovyJavaJavaScriptNoSQLPostgresPrometheusPythonRSigmaSQLTerraform
Artificial Intelligence • Cloud • Fintech • Machine Learning • Mobile • Software
The Staff Site Reliability Engineer will design, implement, and optimize infrastructure for AI services, ensure reliability and performance, and drive automation and observability excellence across engineering teams.
Top Skills:
AzureAzure DevopsDockerElk StackGithub ActionsGrafanaKubernetesMimirPostgresPrometheusSQL ServerTeamcityTerraform
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Software • Energy • Utilities
As a Senior Site Reliability Engineer, you'll manage GCP infrastructure, improve incident processes, develop observability platforms, and advocate for reliability best practices.
Top Skills:
GCPInfrastructure-As-CodeKubernetesUnix
Blockchain • Software
As a Site Reliability Engineer at Offchain Labs, you will manage infrastructure in cloud environments, design CI/CD workflows, and enhance system reliability with a focus on blockchain technology.
Top Skills:
ArgocdAWSAzureCodebuildGCPGithub ActionsGoGrafanaKubernetesLokiPrometheusPythonTerraform
Greentech • Software • Energy
This role involves managing cloud infrastructure, improving system reliability, automation, incident response, and mentoring engineers, requiring deep technical expertise and leadership skills.
Top Skills:
AWSBashDatadogDockerGCPJavaScriptKubernetesLinuxPythonTypescript
Security • Software • Cybersecurity
Seeking a Site Reliability Engineer to manage software development tools for DevOps, optimize workflows, and ensure system performance and reliability while integrating AI-driven solutions.
Top Skills:
ArtifactoryAWSAzureBashClickupConfluenceDockerFigmaFullstoryGCPGitGrafanaJIRAKubernetesPower BIPrometheusPythonSplunkTerraform
Information Technology • Mobile • News + Entertainment • Social Media
The Senior Site Reliability Engineer will enhance the reliability of Reddit's engineering platforms, automate processes, and optimize performance.
Top Skills:
ClickhouseCloudGoGrafanaKubernetesLokiOtelPrometheusPythonThanosVector
Popular Job Searches
All Filters
Total selected ()
No Results
No Results



























