Capital One Logo

Capital One

Sr. Manager SRE (Individual Contributor)

Posted 3 Days Ago
Be an Early Applicant
Remote or Hybrid
Hiring Remotely in Mexico City, Ciudad De México
Senior level
Remote or Hybrid
Hiring Remotely in Mexico City, Ciudad De México
Senior level
Lead the technical vision and roadmap for SRE in Mexico City; establish SLOs, error budgets, and operational standards; design AI-driven automation for alert classification and remediation; drive observability and platform convergence; triage and resolve complex incidents; architect secure automation for operational processes; and mentor engineers to raise reliability and operational excellence across payment systems.
The summary above was generated by AI
WeWork Reforma Latino (97001), Mexico, Ciudad de Mexico, Ciudad de Mexico
Sr. Manager SRE (Individual Contributor)
We're building a Site Reliability Engineering center in Mexico City, and we're hiring a Senior Manager-level SRE to serve as the technical anchor for the site - defining the reliability vision, driving cross-team execution, and pioneering automation and AI-driven approaches that transform how we operate three payment networks at scale.
This is a strategic technical leadership role. You won't manage people directly, but you'll shape how multiple teams work - setting architectural direction for observability, automation, and operational excellence, alert signal reduction, and reliability platform convergence. You'll be the most senior IC engineer in Mexico City, partnering with the Director (people leader) to translate organizational goals into technical roadmaps and ensuring the engineering quality bar stays high as the site scales.
You'll operate across the full landscape: batch settlement systems processing every domestic and international credit/debit transaction, real-time observability platforms that must detect failures before customers do, and AI-powered automation that eliminates the toil standing between us and a proactive reliability culture.
What You'll Do
  • Define and maintain a 12-18 month technical vision and roadmap for GPN SRE in Mexico City - decompose destination architecture into deliverable steps, sequence investments, and align execution across teams
  • Drive reliability transformation across settlement, observability, and automation domains - establish SLOs, error budgets, severity frameworks, and operational standards that teams build against
  • Pioneer AI and agentic automation approaches - design and build AI-driven solutions (using Claude Code, Copilot CLI, and LLM frameworks) for alert classification, runbook generation, automated remediation, and incident analysis; set patterns that other engineers extend
  • Own the technical strategy for domain-specific knowledge ramp-up: identify which domain expertise requires deep engineering investment vs. documentation, and architect systems that reduce reliance on tribal knowledge
  • Lead cross-team technical initiatives - drive observability platform convergence, standardize on COF tooling, and eliminate arbitrary uniqueness across towers
  • Serve as the senior escalation point for complex production incidents - diagnose cascading failures across distributed systems (storage, network, application), drive resolution, and ensure durable fixes land
  • Architect automation for high-risk operational processes - certificate rotation, compliance artifact generation, settlement cycle validation - ensuring security and reliability are built in from design
  • Mentor and elevate engineers across teams - conduct design reviews, establish engineering standards, coach on debugging and system thinking, and create an environment where Principal Associates and Managers grow into domain experts
  • Introduce and advocate for engineering practices that raise the bar - AI engineering, innersourcing, reuse over rebuild, open source contribution, blameless postmortems, and chaos engineering
  • Influence beyond the CDMX site - partner with US and UK leadership on architectural decisions, represent CDMX engineering in cross-org forums, and shape GPN-wide reliability strategy

What Success Looks Like
  • Technical roadmap established and executing - teams are delivering against a clear, sequenced plan with measurable reliability OKRs
  • At least one domain (alert signal reduction or settlement automation) where CDMX operates autonomously without US/UK escalation, driven by systems and patterns you architected
  • AI-powered automation deployed in production - incident classification models, generated runbooks, or automated remediation that demonstrably reduces MTTR or toil
  • Engineering standards and patterns documented and adopted - design review process, observability standards, incident response framework, and automation patterns that scale with the team
  • Recognized as the technical authority for GPN SRE reliability - sought out across towers and geographies for architectural guidance, incident escalation, and strategic input
  • Multiple engineers grown through your mentorship - visible skill development in system design, debugging, and operational judgment across the CDMX teams

The Environment
You'll operate across hybrid on-prem and cloud infrastructure supporting real-time and batch financial transaction systems at global scale. The stack spans Python, Java, shell scripting, AWS, Kubernetes, OpenShift, CI/CD pipelines, and API automation frameworks. Observability runs on Datadog and Observe with complex dashboard configuration across three payment networks. Secret management and certificate automation use HashiCorp Vault. You'll design and build agentic AI automation solutions using Claude Code and LLM frameworks - this is central to the role, not an add-on. The systems span multiple on-prem data centers with mainframe, Linux, and containerized workloads alongside AWS. You'll need deep troubleshooting and debugging skills across all layers of the stack and the judgment to know when to go deep vs. when to delegate.
Basic Qualifications
  • Professional English fluency
  • Bachelor's degree
  • At least 8+ years of experience in SRE, production operations, or reliability engineering
  • Experience in DevOps Engineering (internship experience does not apply)
  • 8+ years of experience in at least one of the following: Java, Python, Go
  • At least 6 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • 5+ years of experience with container orchestration services including Docker or Kubernetes
  • Experience with Shell or Bash scripting
  • At least 5 years of Unix or Linux system administration experience

Preferred Qualifications
  • Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
  • Troubleshooting and debugging skills across distributed systems
  • Familiarity with payments, financial services, or other regulated high-availability domains
  • Knowledge or experience of Networking concepts (TCP/DNS/TLS)

At Capital One, we respect individual differences in culture, religion, and ethnicity. Likewise, we promote equal opportunities and development for all personnel. In the hiring process, we seek to provide equal employment opportunities to candidates, regardless of race, color, religion, gender, sexual orientation, marital or civil status, national origin, disability, or any other situation protected by federal, state, or local laws.
For technical support or questions about Capital One's recruiting process, please send an email to [email protected]
Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.
Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe, any position posted in the Philippines is for Capital One Service Corp (COPSSC), and any position posted in Mexico is for Capital One Technology Labs Mexico.

Capital One Chicago, Illinois, USA Office

You’ll be steps away from some of the best attractions, restaurants and scenery in Chicago while working from our dynamic office. The space, which features a mix of industrial design and modern fixtures, has everything to help you collaborate, thrive and embrace our future with hybrid work.

Similar Jobs at Capital One

20 Hours Ago
Remote or Hybrid
Junior
Junior
Fintech • Machine Learning • Payments • Software • Financial Services
This role manages the content pipeline for blogs and social posts, coordinating with teams to promote Capital One's technology leadership and drive engagement.
Top Skills: AIBlog MarketingCloud ComputingData ManagementMachine LearningSocial MediaSoftware EngineeringSprinklr
20 Hours Ago
Remote or Hybrid
Senior level
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
The Senior Director of Product Management will lead product strategy in a global environment, manage teams, and drive technological innovation and user-focused product development.
Top Skills: APIsCloudMicroservices
3 Days Ago
Remote or Hybrid
Mid level
Mid level
Fintech • Machine Learning • Payments • Software • Financial Services
Join a new Mexico City SRE center to build reliability for payment-critical systems. Develop observability, alerts, runbooks, and automation using Python/Java/shell across on-prem and AWS. Troubleshoot production incidents, participate on-call, automate operational processes, manage secrets, and deliver CI/CD-driven solutions that improve MTTR and settlement reliability.
Top Skills: Api Automation FrameworksAWSBashCi/CdClaude CodeCopilot CliDatadogDockerGoHashicorp VaultJavaKubernetesObserveOpenshiftPythonShellUnix/Linux

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

  • Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
  • Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
  • Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
  • Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account