Worth AI Logo

Worth AI

Senior DevOps Engineer, Infrastructure & Reliability

Posted 2 Hours Ago
In-Office or Remote
5 Locations
Senior level
In-Office or Remote
5 Locations
Senior level
Lead infrastructure and reliability efforts by designing scalable IaC patterns, operating production Kubernetes, optimizing CI/CD, improving observability and incident response, automating manual processes, enforcing secure networking and secrets management, and driving reliability metrics and cost-efficiency.
The summary above was generated by AI

Worth AI, a leader in the computer software industry, is looking for a Senior DevOps Engineer to join our Infrastructure team with a singular mission: to make our systems faster, more reliable, and more resilient while making life dramatically easier for engineers shipping software. In this role, you won’t just manage infrastructure; you will design and evolve the foundation that every product and engineer depends on.

You will act as a force multiplier by eliminating operational friction, automating repetitive processes, strengthening system reliability, and building scalable infrastructure patterns that allow teams to deploy confidently and recover quickly. You are part architect, part reliability engineer, and part automation evangelist.

Responsibilities
    • Conduct regular interviews with engineering teams to identify operational pain points in CI/CD, deployments, observability, and cloud environments and proactively eliminate them.
    • Design and implement scalable Infrastructure-as-Code patterns using tools like Terraform to standardize cloud provisioning and reduce configuration drift.
    • Own and evolve our Kubernetes platform (EKS or self-managed), ensuring workloads are secure, scalable, and resilient by default.
    • Architect and optimize CI/CD pipelines to improve deployment frequency, reduce lead time, and increase confidence in releases.
    • Lead systemic reliability initiatives, including incident response improvements, root cause analysis practices, and postmortem frameworks.
    • Design and enforce secure networking, IAM, and secrets management strategies across environments.
    • Improve observability by refining metrics, logs, and tracing using tools like DataDog, ensuring actionable insight into system health.
    • Optimize cloud cost efficiency through rightsizing, autoscaling strategies, and architectural improvements.
    • Own disaster recovery planning, backup strategies, and multi-region resilience initiatives.
    • Refactor brittle or manually managed infrastructure into automated, testable, and reproducible systems.
    • Introduce new infrastructure tooling or architectural shifts and drive adoption through documentation, workshops, and hands-on support.
    • Lead by example in incident management, risk mitigation, and operational excellence.
    • Communicate technical trade-offs clearly across engineering and product stakeholders, balancing speed with safety.

Technology Stack

  • Cloud & Infrastructure: AWS (EKS, RDS, MSK, S3, Lambda, IAM, VPC)
    Containerization & Orchestration: Kubernetes, ArgoCD
    Infrastructure-as-Code: Terraform
    CI/CD: GitHub Actions (or equivalent)
    Monitoring & Observability: DataDog
    Data & Messaging: PostgreSQL, Kafka, Redis
    Languages (as needed): Bash, Python, TypeScript

Requirements
  • 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
  • Proven experience designing and operating production Kubernetes environments at scale.
  • Deep hands-on expertise with AWS infrastructure and cloud networking.
  • Strong experience building and maintaining Terraform modules across large cloud environments.
  • Demonstrated ownership of CI/CD systems and measurable improvement of DORA metrics.
  • Experience leading incident response processes and driving meaningful postmortem outcomes.
  • Strong understanding of distributed systems, event-driven architectures (Kafka), and database performance (PostgreSQL).
  • Proven ability to modernize legacy infrastructure and eliminate manual operational toil.
  • Experience navigating high-ambiguity environments and translating operational friction into prioritized infrastructure roadmaps.
  • Demonstrated ability to build trust across teams while raising the reliability bar.

Success Metrics

  • DORA Metrics Improvement:
    • Drive measurable improvements in Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Recovery (MTTR).
  • System Reliability:
    • Maintain or exceed defined SLO/SLA targets with reduced incident frequency and duration.
  • Infrastructure Stability:
    • Reduce production incidents caused by misconfiguration, manual processes, or infrastructure drift.
  • Operational Efficiency:
    • Increase percentage of infrastructure managed through code and automation.
  • Cost Optimization:
    • Improve cloud cost efficiency without sacrificing reliability or performance.
Bonus Points (Nice to Have)
  • Experience operating high-throughput Kafka clusters (MSK or self-managed).
  • Strong background in database performance tuning (PostgreSQL, Redis).
  • Experience implementing autoscaling strategies for high-traffic systems.
  • Familiarity with service mesh technologies.
  • Experience building internal developer platforms (IDP).
  • Background in security best practices (zero-trust networking, policy-as-code).
  • Experience with multi-region or globally distributed systems.
  • Proficiency in Python for automation and tooling development.
  • Experience introducing platform-wide reliability frameworks (SLOs, error budgets, chaos testing).

** All Remote Hires - will be required to travel to Orlando, Florida at least twice per year for Town Halls and team collaboration in addition to orientation in Orlando, Florida.


Benefits
  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance
  • Flexible Vacation
  • Work From Home
  • Free Food & Snacks (in office)
  • Orlando, Florida (Hybrid)
  • Wellness Resources

Top Skills

Aws (Eks
Iam
Lambda
Msk
Rds
S3
Vpc),Kubernetes,Argocd,Terraform,Github Actions,Datadog,Postgresql,Kafka,Redis,Bash,Python,Typescript

Similar Jobs

3 Days Ago
Remote or Hybrid
United States
80K-180K Annually
Mid level
80K-180K Annually
Mid level
3D Printing • AdTech • Aerospace • Big Data • Blockchain • Computer Vision • Co-Working Space or Incubator
The DevOps Engineer will design, implement, and maintain DevOps practices, collaborate with development teams, automate deployments, and ensure system reliability through monitoring and compliance.
Top Skills: AWSAzureBashDevops ToolsDockerGCPJenkinsKubernetesPython
3 Days Ago
Remote or Hybrid
United States
80K-190K Annually
Mid level
80K-190K Annually
Mid level
3D Printing • AdTech • Artificial Intelligence • Cannabis • Co-Working Space or Incubator • Digital Media
The DevOps Engineer will design, implement, and maintain DevOps practices, automate deployment, ensure compliance with standards, and troubleshoot issues.
Top Skills: AWSAzureBashDockerGCPJenkinsKubernetesPython
17 Days Ago
Remote
United States
Senior level
Senior level
Information Technology
The Senior DevOps Engineer ensures stability of global IVR systems, managing operations, deployments, and responding to incidents. Requires strong cloud and troubleshooting skills.
Top Skills: Apache TomcatAWSGenesysInfrastructure As CodeMySQLSipUnix/LinuxWireshark

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

  • Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
  • Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
  • Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
  • Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account