Top Reliability Engineer Jobs in Chicago
Build software to automate the operation of large distributed systems, ensure platform security, document work, contribute to open source, and improve code quality. Requires recent experience with Go or Python and full-time software engineering experience.
As a Site Reliability Engineer III at JPMorgan Chase, you will configure, maintain, monitor, and optimize applications and infrastructure using Java, Python, and other technologies. Responsibilities include collaboration with teams, implementing deployment approaches, and supporting site reliability best practices. Required qualifications include 3+ years of applied experience, proficiency in site reliability principles, knowledge of programming languages like Python and Java, and experience with observability and CI/CD tools.
The Principal Application Reliability Engineer will partner with teams to identify and fix inefficiencies to solve system reliability and performance opportunities. They will consult teams and provide hands-on training in observability, incident management, and reliability best practices. Additional responsibilities include leading failure point discussions, conducting chaos testing, and driving capacity management.
As a Principal Site Reliability Engineer (SRE) at Discover, you will be responsible for improving reliability and performance by developing and running SRE tooling and observability solutions. You will work on CI/CD, data monitoring, and defining SRE practices within the Fraud value stream. Responsibilities include building monitoring alerts, documenting actions for automation, and debugging production issues. Minimum qualifications include a bachelor's degree in Computer Science and 6+ years of experience in Information Technology or Engineering. Preferred qualifications include SRE experience, strong knowledge of SDLC, git, Docker, Kubernetes, Jenkins, and programming skills in Shell or Java.
Looking for a Principal Site Reliability Engineer at Donnelley Financial Solutions (DFIN) to deliver best-in-market SaaS solutions primarily for clients working with regulatory bodies. Responsibilities include championing SRE culture, optimizing application performance, automating system operations, and staying updated on latest technologies. Requires 8+ years of experience in software development, secure coding, automated deployments, and infrastructure management.
We are looking for a reliability expert who is passionate about scaling Cloud services to join our growing SRE teams. An ideal candidate is someone who is aware of current industry trends (particularly those related to reliability) and who thrives on working with a diverse set of partners, who can articulate the business impact of a problem and can also dive deep into the technical solution.
The Senior Site Reliability Engineer is responsible for ensuring SaaS products are fast, stable, and optimized for customers. They focus on availability, performance, managing change, monitoring, and response. The role involves automation, collaboration, and staying updated on the latest technologies.
Looking for a Principal Site Reliability Engineer with expertise in scaling Cloud services, modern Cloud infrastructure, and programming. The engineer will improve reliability, performance, scalability, and cost efficiency by advocating for reliability methodologies and driving adoption of best practices. Strong communication skills and experience in driving cross-organizational initiatives are required. Mentorship experience is preferred.
Featured Jobs
Work as a Site Reliability Engineer to contribute to the development and maintenance of a cloud platform and SaaS platform. Responsible for scaling cloud infrastructure, ensuring reliability and security, writing automation and monitoring tools, participating in agile sprints, and working cross-functionally with different teams.
As a Senior Site Reliability Engineer at iManage, you will contribute to the company's SaaS platform by participating in architectural and design discussions, driving innovation and platform evolution, scaling cloud infrastructure, and ensuring reliable deployment and maintenance of distributed systems. You will also be responsible for adhering to security best practices, writing automation tooling, and working cross-functionally with various teams.
As a Federal Site Reliability Engineer at ServiceNow, you will provide 24x7 support for the Government Cloud infrastructure during the 3rd shift. Responsibilities include driving technical resolutions, improving operability, and reducing incidents.
Site Reliability Engineer role with ServiceNow, supporting Federal Government systems on 3rd shift. Responsibilities include technical resolutions across the technology stack, driving platform operability, reducing incidents, and improving services for customers. Requires expertise in DevOps, Automation, Scripting, Linux, software development, Observability, Monitoring, and Cloud technologies.
The Site Reliability Engineer at ServiceNow is responsible for maintaining and developing the reliability, scalability, and performance of the infrastructure. The role involves a blend of software development, networking, and systems engineering to enhance service operability and reduce incidents.
The Site Reliability Engineer at ServiceNow is responsible for maintaining and developing the reliability, scalability, and performance of the ServiceNow infrastructure. This role involves driving technical resolutions, software development, networking, systems engineering, and enhancing platform operability.
Java Application SRE role at ServiceNow supporting US Federal customers. Responsibilities include production stability, performance tuning, troubleshooting, and automation. Ideal for candidates with a strong background in database technologies.
The Senior Site Reliability Engineer at Tock will work to ensure products and infrastructure are reliable, fast, efficient, and secure, reducing toil. Responsibilities include simplifying systems, accelerating adoption of Config as Code and Infrastructure as Code, troubleshooting incidents, defining SLIs and SLOs, and shaping system evolution. The role requires 5 years of SRE or DevOps experience, familiarity with HA products on public clouds using Kubernetes, knowledge of modern web-based applications, and fluency in programming languages like Java, Go, Python, and more.
As a Software Engineer III at JPMorgan Chase within the Commercial & Investment Bank (CIB), Engineering & Architecture SRE/DevOps team, responsible for designing and delivering trusted market-leading technology products in a secure, stable, and scalable way. Executes software solutions, creates secure production code, analyzes data sets, contributes to software engineering communities, and enhances team culture.
Tempus is looking for a Senior Software Engineer, SRE to join their team. The ideal candidate will have experience managing cloud infrastructure, building and deploying containerized applications, automating manual tasks, and working in agile environments. Bonus points for experience with infrastructure-as-code tools, GCP or AWS Virtual Desktop Infrastructure, database design and tuning, Bash scripting, and previous experience in the healthcare sector and securing infrastructure to compliance frameworks.
Cloud Engineer II role at McDonald's involving automation of cloud environments, developing automation workflows, strategizing with teams, staying updated on new technologies, and leading incident response strategies.
As a SRE/DevOps Engineer at OppFi, you will automate deployment processes for AWS platform, improve tech stack observability, troubleshoot issues, and contribute to team growth. Remote-capable position in the US.
Manage large critical Cassandra and Elasticsearch clusters supporting millions of transactions per day. Build systems to automate all build and maintenance tasks. Develop self-service tools to allow engineers to manage and provision resources. Monitor cluster availability and performance metrics. Evaluate new technologies and software versions. Implement DR strategies. Work with other engineers to manage data persistence integration and performance. Monitor and scale Elasticsearch/Cassandra clusters.
Atlassian is seeking a Senior Site Reliability Engineer to improve performance and reliability of services, automate repetitive work, and respond to system issues. Responsibilities include writing code in Bash and Python, capacity planning, managing infrastructure in AWS, and maintaining high code quality. Desired skills include experience with Ansible, Puppet, Docker, Kubernetes, ITIL, and compliance requirements.
Join Matillion as a Staff Site Reliability Engineer and be responsible for improving the availability, scalability, latency, and efficiency of the company's SaaS services. This role requires experience with Kubernetes, AWS, ArgoCD, Terraform, DataDog, Prometheus, and programming languages like Golang and Python. You will lead the design of major software components, drive observability infrastructure expansion, mentor team members, and manage multiple projects. This position is remote with occasional in-person meetings in either Manchester, UK or Denver, US.
As an Application Site Reliability Engineer (SRE) at Discover, you will be responsible for ensuring the availability of critical applications such as Card and Bank websites and mobile applications by implementing resiliency and automation practices. Your role involves collaborating with application development teams to enhance system reliability and performance through observability, monitoring, alerting, and automation processes.
Senior Manager of Cloud Data Platform Operations & Security responsible for technical development of engineering staff within the cloud data & analytics practice, supporting AWS, GCP, and OpenShift technologies. Accountable for skilling, tooling, and allocating Chapter members to support Product and Value Stream Engineering teams using SRE practices. Coach team for continuous improvement, innovation, and fostering a culture of learning.
Top Chicago Companies Hiring Reliability Engineers
See AllAll Filters
No Results
No Results