CACI International Inc Logo

CACI International Inc

Senior MLOps Platform Engineer

Posted 14 Days Ago
In-Office or Remote
Hiring Remotely in State Road, IL
82K-172K Annually
Senior level
In-Office or Remote
Hiring Remotely in State Road, IL
82K-172K Annually
Senior level
Design and operate a unified MLOps platform, ensuring performance optimization, security, and collaboration with teams to transition AI models to production.
The summary above was generated by AI
Job Title: Senior MLOps Platform Engineer

Job Category: Information Technology

Time Type: Full time

Minimum Clearance Required to Start: None

Employee Type: Regular

Percentage of Travel Required: Up to 10%

Type of Travel: Local

* * *

The Opportunity:

Our AI Center of Excellence builds the next generation of Agentic AI products that autonomously reason, plan, and act on behalf of our customers.  To deliver these capabilities at scale, we need a platform engineering group that provides a robust, secure, and highly available MLOps foundation across both on premise clusters and AWS.  The team works closely with data scientists, product engineers, and SREs to turn experimental models into reliable services that power mission critical applications. 

  • Shape the end-to-end lifecycle of cutting-edge AI services—from model training to production inference. 
  • Influence architecture decisions for a hybrid cloud environment that will serve thousands of concurrent agents. 
  • Collaborate with world-class researchers and product teams while enjoying a strong engineering culture focused on automation, observability, and reliability. 

Responsibilities: 

  • Design, implement, and operate a unified MLOps platform that supports both on-premise Kubernetes clusters and AWS. The platform should enable rapid onboarding of new Agentic AI services and provide consistent governance across environments. 
  • Develop reusable CI/CD pipelines (GitLab CI) for model packaging, containerization, automated testing, canary releases, and rollbacks. 
  • Build observability, monitoring, and alerting stacks (Prometheus, Grafana, OpenTelemetry, CloudWatch) to track inference latency, throughput, resource utilization, and data drift for real time and batch workloads. 
  • Create self-service tooling (CLI, SDKs, UI dashboards) that allows data science and product teams to register models, define inference endpoints, and manage versioning without deep DevOps involvement. 
  • Architect and maintain data pipelines that feed training data, model artifacts, and inference logs into a governed data lake (S3, on prem object store). 
  • Collaborate with research and product engineers to translate experimental Agentic AI prototypes into production grade services, ensuring reproducibility, security, and compliance. 
  • Drive performance optimization for inference workloads (GPU/CPU scaling, model quantization, batching strategies) 
  • Champion best practices in security (IAM, network policies, secret management), cost efficiency, and disaster recovery for the hybrid infrastructure. 
  • Mentor junior engineers and contribute to internal knowledge bases, upskilling, and review processes. 

Qualifications:

Required:

  • Must be a U.S. Citizen
  • BS in computer science or related engineering field
  • 5+ years of experience building and operating production grade software infrastructure, preferably in a hybrid onprem / cloud environment
  • Deep expertise with Kubernetes (cluster provisioning, Helm, operators, custom resources) and container runtimes (Docker, OCI)
  • Hands on experience with AWS services (EKS, SageMaker, S3, IAM, CloudWatch, Step Functions) and the ability to bridge onprem resources with AWS via VPN/Direct Connect
  • Strong software engineering skills in Python and at least one compiled language (Go, Rust, or Java) for building platform components and SDKs
  • Proficiency with CI/CD and GitOps tooling (Argo CD, Flux, Gitlab, GitHub Actions, or similar)
  • Solid understanding of distributed systems (consensus, fault tolerance, load balancing) and experience tuning high throughput, low latency inference pipelines
  • Experience with data engineering frameworks (Airflow, Prefect, Kafka, Spark, Flink) and building robust, versioned data pipelines
  • Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK) and the ability to define meaningful SLIs/SLOs for AI services
  • Track record of collaborating with research or product teams to move prototypes to production, translating experimental code into maintainable services
  • Strong problem solving mindset, excellent written and verbal communication, and a passion for building scalable AI platforms

Desired:

  • Working knowledge of Scrum and Agile software development methodology

-

What You Can Expect:

 A culture of integrity.

At CACI, we place character and innovation at the center of everything we do. As a valued team member, you’ll be part of a high-performing group dedicated to our customer’s missions and driven by a higher purpose – to ensure the safety of our nation.

An environment of trust.

CACI values the unique contributions that every employee brings to our company and our customers - every day. You’ll have the autonomy to take the time you need through a unique flexible time off benefit and have access to robust learning resources to make your ambitions a reality.

A focus on continuous growth.

Together, we will advance our nation's most critical missions, build on our lengthy track record of business success, and find opportunities to break new ground — in your career and in our legacy.

Pay Range:

There are a host of factors that can influence final salary including, but not limited to, geographic location, Federal Government contract labor categories and contract wage rates, relevant prior work experience, specific skills and competencies, education, and certifications. Our employees value the flexibility at CACI that allows them to balance quality work and their personal lives. We offer competitive compensation, benefits and learning and development opportunities. Our broad and competitive mix of benefits options is designed to support and protect employees and their families. At CACI, you will receive comprehensive benefits such as; healthcare, wellness, financial, retirement, family support, continuing education, and time off benefits.

Since this position can be worked in more than one location, the range shown is the national average for the position.

The proposed salary range for this position is:

$82,100-$172,400

CACI is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, age, national origin, disability, status as a protected veteran, or any other protected characteristic.

Top Skills

Airflow
AWS
Cloudwatch
Docker
Flink
Gitlab Ci
Grafana
Kafka
Kubernetes
Opentelemetry
Prometheus
S3
Spark

Similar Jobs

3 Hours Ago
Easy Apply
Remote
U.S.
Easy Apply
157K-210K Annually
Senior level
157K-210K Annually
Senior level
Artificial Intelligence • Enterprise Web • Software • Design • Generative AI
The Senior Solutions Engineer at Webflow engages with customers to demonstrate the platform's capabilities, designs solutions, and influences product direction, while conducting workshops and trials to ensure alignment with business objectives.
Top Skills: AICSSHTMLJavaScript
3 Hours Ago
Remote
United States
Senior level
Senior level
Artificial Intelligence • Big Data • Software • Infrastructure as a Service (IaaS)
Oversee engineering operations, optimize project delivery, manage teams, improve processes, ensure operational excellence, and maintain compliance practices while contributing as a player-coach.
Top Skills: Compliance Standards (Soc 2Data-Intensive ApplicationsHipaaHitrustPci Dss)Platform Engineering
3 Hours Ago
Remote or Hybrid
United States
250K-300K Annually
Mid level
250K-300K Annually
Mid level
Information Technology • Software
The Enterprise Account Executive will secure large partnerships by managing target accounts, engaging with prospects, negotiating contracts, and collaborating with various teams for smooth transitions and competitive offerings while traveling for conferences and prospect visits.
Top Skills: SaaS

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

  • Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
  • Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
  • Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
  • Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account