CVS Health Logo

CVS Health

Staff Observability Platform Engineer (SRE)

Reposted Yesterday
Be an Early Applicant
In-Office
Richardson, TX
118K-237K Annually
Senior level
In-Office
Richardson, TX
118K-237K Annually
Senior level
The role focuses on designing metrics and observability frameworks, managing error budgets, and automating quality gates for release engineering, ensuring scalable cloud infrastructure and incident management.
The summary above was generated by AI

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.

POSITION SUMMARY

CVS Health PBM is looking for hands-on, passionate people who want to join a high energy and growing team, who want to be on the forefront of digital innovation that aims to reinvent what a pharmacy and a health care company can be in the digital world. 

As a Lead Platform Reliability Engineer, you will design and implement metrics and observability frameworks with a strong focus on service level objectives (SLOs), service level indicators (SLIs), error budgets, and cloud infrastructure scaling and capacity estimation.

This individual contributor role is critical to enhancing our monitoring and observability capabilities, while also driving automation initiatives related to quality gates within the release engineering process. You will work closely with crossfunctional teams to ensure the reliability, performance, and scalable growth of our cloudbased systems.

  

Expectations for the Role:

Metrics Development: Define, implement, and maintain key performance metrics, SLOs, and SLIs to measure system reliability and performance. Ensure alignment with business objectives and operational goals.

Error Budgets: Manage error budgets effectively, collaborating with development teams to balance reliability and feature delivery. Analyze incidents and outages to inform adjustments to error budgets.

Monitoring & Observability: Design and implement comprehensive monitoring solutions to provide real-time visibility into system health. Utilize tools such as Prometheus, Grafana, Loki, Temp and other observability platforms to create dashboards and alerts.

Cloud Infrastructure Scaling: Architect, design, and implement scalable cloud infrastructure capable of supporting multiple business applications, ensuring reliability, performance, and future growth.

Quality Gates Automation: Develop and implement automated quality gates that ensure all releases meet defined reliability and performance standards. Lead the release Devops team to integrate these gates into the CI/CD pipeline.

Incident Management: Assist in incident response efforts by providing insights from metrics and monitoring tools. Conduct post-mortem analyses to identify root causes and recommend preventive measures.

AIOps Insight Automation: Use AI to surface what changed / what’s abnormal / next best action from metrics, logs, and traces—minimizing manual dashboard analysis. 

AI‑Accelerated Incident Response: Apply GenAI to speed triage and RCA with fast signal summarization and guided investigation paths. 

AI/LLM Observability & Governance: Monitor AI workloads for quality, safety, cost, latency, reliability with end‑to‑end tracing (request → prompt → tools → output) and secure logging/redaction.

AI‑Backed Release Quality Gates: Embed AI signal checks into CI/CD to flag SLO risk, latency/error drift, and regression patterns before production release.

 REQUIRED QUALIFICATIONS

  • 10+ years of experience in Software Engineering, Platform Engineering, or SRE.
  • 7+ years of experience with observability practices, including SLIs/SLOs/SLAs, alerting, and incident management.
  • 7+ years building production-grade backend services in Java/python.
  • 7+ years implementing and operating OpenTelemetry, including OTLP, semantic conventions, and instrumentation patterns.
  • 7+ years with cloud-native and containerized platforms (Docker, Kubernetes, Argo CD).
  • 7+ years working with public cloud platforms (AWS, GCP, or Azure).
  • 5+ years designing and scaling distributed, highvolume data pipelines.
  • 5+ years working with Grafana OSS or comparable observability backends (e.g., Grafana, Loki, Tempo, Prometheus).
  • 5+ years with relational databases (PostgreSQL, MySQL).

 

PREFERRED QUALIFICATIONS

  • Excellent analytical skills and the ability to communicate complex technical concepts to non-technical stakeholders
  • Experience with service meshes and networking technologies such as Envoy and Istio
  • Experience integrating or operating commercial observability platforms (Splunk, AppDynamics, etc.)
  • Experience with streaming and data platforms such as Kafka, Pulsar, or similar technologies
  • Familiarity with time-series, NoSQL, or analytical databases (ClickHouse, Bigtable, Cassandra, etc.)
  • Experience with Infrastructure as Code tools such as Terraform or CloudFormation
  • Experience with cost optimization and capacity planning for large-scale cloud infra
  • Experience with chaos engineering, resiliency testing, or fault injection
  • Background in securityaware platform design, including secure servicetoservice communication
  • Experience mentoring senior engineers and influencing platform standards across organizations
  • Strong operational experience supporting 24x7 production systems, including oncall responsibilities
  • Knowledge of security best practices in cloud environments

EDUCATION

Bachelor’s degree or equivalent experience (HS diploma + 4 years relevant experience)

Pay Range

The typical pay range for this role is:

$118,450.00 - $236,900.00


This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls.  The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors.  This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above.  This position also includes an award target in the company’s equity award program. 
 

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.

This full‑time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.


Additional details about available benefits are provided during the application process and on
Benefits Moments.

We anticipate the application window for this opening will close on: 06/30/2026

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

CVS Health Chicago, Illinois, USA Office

525 W Monroe St, Chicago, IL, United States, 60661

CVS Health Northbrook, Illinois, USA Office

2211 Sanders Road, Northbrook, IL, United States, 60062

Similar Jobs

22 Minutes Ago
Hybrid
113K-183K Annually
Senior level
113K-183K Annually
Senior level
Artificial Intelligence • Cloud • Internet of Things • Software • Cybersecurity • Industrial
Lead end-to-end data architecture and analytics for Parts Sales to End Users (STU): design Snowflake data models, build scalable data pipelines, develop Power BI dashboards, support forecasting and ML models, validate data quality, drive automation and system modernization, and communicate insights to senior stakeholders to improve reporting, forecasting, and business decisions.
Top Skills: AlteryxAws Ec2Aws GlueAws LambdaAws S3Azure DevopsDeep LearningMachine LearningPower BIPythonRSnowflakeSQL
30 Minutes Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
46K-80K Annually
Senior level
46K-80K Annually
Senior level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Provide strategic employee relations guidance to business leaders and People teams on investigations, performance management, misconduct, leaves/accommodations, and policy interpretation. Drive case management, documentation, analytics to identify trends, advise on risk, support DEI, develop manager toolkits, and partner cross-functionally to improve ER processes and training.
Top Skills: Tofu
49 Minutes Ago
Hybrid
Senior level
Senior level
Gaming • Information Technology • Mobile • Software • Esports
Lead design, build, and operation of multi-cloud hybrid infrastructure and Kubernetes platforms. Drive observability, SLI/SLOs, incident response, automation, CI/CD hardening, secrets/policy-as-code, and promote SRE practices across studios.
Top Skills: 1PasswordAnsibleArgocdAWSAws Secrets ManagerAws Systems ManagerBare MetalCiliumDatadogEksFluxGCPGithub ActionsGkeGoGrafanaHelmIstioJenkinsKubernetesOpa/GatekeeperOpentelemetryPasswordstatePrometheusPulumiPuppetPythonTerraformTerragruntTypescriptVMware

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

  • Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
  • Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
  • Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
  • Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account