About the Team/Role
We are looking for a highly motivated and high-potential Senior Manager Site Reliability Engineering (SRE) to join our team as a technical leader and drive transformative impact across WEX’s platform reliability and operational excellence.
This is a particularly exciting time to be part of the SRE function at WEX. Our diverse product ecosystem supports a wide array of customer businesses and generates rich, complex telemetry across applications, infrastructure, and platforms. Ensuring these systems are scalable, observable, and resilient is critical to unlocking business value and customer success.
As a Sr Manager SRE, you will play a pivotal role in shaping the reliability engineering strategy at WEX. You’ll architect and lead efforts that improve availability, performance, and efficiency at scale—driving initiatives across observability, automation, incident management, problem management, capacity planning, and performance optimization. You’ll be hands-on in building foundational tooling and frameworks while also acting as a multiplier—mentoring engineers, aligning cross-functional teams, and influencing platform decisions with a strong reliability lens.
You’ll work closely with engineering, product, and platform teams to instill SRE best practices and enable a shift toward proactive, scalable operations. Our team embraces agile development, a strong product mindset, and modern engineering practices, including AI-assisted operations and intelligent automation.
You’ll take on some of the most complex, high-impact challenges at WEX—supported by a team of highly skilled engineers and technical leaders invested in your success and growth.
If you’re a senior technical leader passionate about building reliable systems, leading through influence, and making a meaningful impact, this is a fantastic opportunity for you.
How you’ll make an impact
Architect and oversee the implementation of mission-critical systems.
Define and enforce SRE best practices and operational standards.
Lead cross-functional initiatives to enhance system reliability and performance.
Serve as a technical advisor for engineering leadership.
Develop capacity planning and load testing strategies.
Design self-healing and auto-recovery mechanisms.
Drive cloud cost optimization and budgeting initiatives.
Lead one or more SRE teams responsible for a major platform or domain.
Partner with Engineering, Product, and Program stakeholders to align team delivery with business priorities.
Experience you’ll bring
8+ years of experience with a focus on large-scale system reliability.
Expertise in system architecture, cloud platforms, and automation frameworks.
Deep knowledge of Kubernetes, service meshes, and distributed tracing.
Experience with monitoring and logging (Grafana, ELK stack, Splunk, etc.).
Knowledge of containerization and orchestration (Docker, Kubernetes).
Experience designing high-availability, fault-tolerant architectures.
Strong understanding of database reliability engineering (MySQL, PostgreSQL, NoSQL). Knowledge of networking, databases, and storage architectures.
Excellent incident command and crisis management skills.
Experience setting team OKRs and aligning reliability goals with product and platform engineering strategies.
Preferred Qualification
Experience with multi-region and multi-cloud deployments.
Deep expertise in scalable microservices and event-driven architectures.
Strong experience with advanced observability tools (OpenTelemetry, Jaeger, Prometheus).
Leadership in driving large-scale SRE transformations.
Experience with designing and developing AI based solutions.
Ability to influence engineering culture and process improvements.
Experience in healthcare, insurance, or benefits technology.
Understanding of Benefits domain such as claims processing, eligibility lookup success rate.
Experience working with compliance frameworks such as HIPAA, SOC 2, or HITRUST.
Proven success building and scaling high-performing SRE teams in production environments.
Ability to develop team-wide practices around incident management, postmortems, alert hygiene, and reliability KPIs.
Skilled at coaching engineers through complex reliability challenges and career inflection points.
Top Skills
Similar Jobs
What you need to know about the Chicago Tech Scene
Key Facts About Chicago Tech
- Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
- Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
- Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
- Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory



