Wizard AI Jobs

AI Applied Scientist

Wizard AI

AI Applied Scientist

Posted 10 Hours Ago

Remote

Hiring Remotely in USA

225K-280K Annually

Senior level

Remote

Hiring Remotely in USA

225K-280K Annually

Senior level

The Applied Scientist will measure and improve the accuracy of Wizard's AI agent through metrics, experiments, and data analysis, partnering with ML and AI engineering teams.

The summary above was generated by AI

About Wizard

Wizard is the top-performing AI Shopping Agent, delivering the best products from across the web with unmatched accuracy, quality, and trust.

The Role

We’re looking for an Applied Scientist to own how we measure, understand, and improve the accuracy of our AI agent. This role sits at the intersection of applied ML, evaluation science, and product. You’ll define what “good” looks like for our agent, build the systems to measure it, and lead the science work to improve it, including fine-tuning the LLM judges that power our evaluation pipeline.

You’ll partner with ML Engineering and AI Engineering. What you will do is bring scientific rigor to the most important question at Wizard: is our agent getting better, and how do we know?

This is a foundational hire on our science team. Evaluation is the starting point, and the role is scoped to grow into broader applied science work as the surface area of the agent expands (recommendations, personalization, ranking, multimodal, conversational understanding).

What You’ll Do

Define and evolve accuracy metrics across the full shopping experience (retrieval, ranking, recommendations, outcomes)
Design and run experiments to measure improvements and regressions
Build and maintain evaluation datasets, benchmarks, and scoring frameworks
Improve the LLM judges that power our evaluation pipeline: prompting, calibration, and fine-tuning where it matters
Translate ambiguous product questions into clear, measurable hypotheses and analysis
Partner with ML Engineers to validate model changes and guide iteration
Identify failure modes and edge cases, and drive improvements through data
Make agent performance visible, trusted, and actionable across product and engineering

First 3 months

Go deep on the agent, the current eval pipeline, and the metrics we use today
Audit existing accuracy metrics and benchmarks; identify gaps, blind spots, and signals that aren’t trustworthy
Build relationships with ML, AI Engineering, and Product
Ship one quick win: a missing benchmark, an improved metric, or a fix to a misleading signal
Establish a baseline view of agent performance the team can rally around

Months 3 to 6

Own the evaluation framework: datasets, metrics, scoring, reporting, both offline and online
Drive measurable improvements to LLM judge quality (calibration, fine-tuning where appropriate)
Run experiments that influence at least one significant model or product change
Stand up automated evaluation the team trusts before and after every launch
Build dashboards and reporting that make agent performance legible to leadership

Beyond 6 months

Lead applied science work on the next frontier as the agent grows: multi-turn evaluation, multimodal, personalization, ranking quality, conversational understanding
Influence team-level strategy on what we measure, what we improve, and why
Mentor and help grow the science function as it expands

What Success Looks Like

Clear, trusted accuracy metrics are consistently used across product and engineering
A robust automated evaluation framework for both offline and live experiments
Model and product changes are consistently measured before and after launch
Demonstrable improvements in LLM judge quality and eval coverage
Science leadership that informs what we build, not just whether it works

Career Growth

Depth track: become the org’s authority on AI evaluation: eval strategy, judge models, agent benchmarking
Breadth track: expand into other applied science problems (recommendations, personalization, ranking, multimodal, conversational understanding) as those areas come online
Leadership track: Senior / Staff Applied Scientist, with technical leadership across the science function
As the agent gets more capable, the science problems get richer

Ideal Background

5+ years in Applied ML, AI Research, or Applied Science (PhD or equivalent depth strongly preferred)
Hands-on experience evaluating modern AI/ML systems: LLMs, agents, ranking, or recommendations
Direct experience with LLM-based systems: judge models, RAG, prompt engineering, fine-tuning, RLHF, or similar
Strong experimentation foundations: A/B testing, causal inference, statistical rigor
Proven ability to operate in ambiguity: defining problems, not just solving pre-defined ones
Clear, structured communication that influences across ML, engineering, and product

Compensation & Benefits

The expected base salary range for this role is $225,000 - $280,000 USD, and will vary based on skills, experience, role level, and geographic location. Final compensation will be determined by considering these factors alongside overall role scope and responsibilities.

In addition to base salary, Wizard offers:

Equity in the form of stock options
Medical, dental, and vision coverage
401(k) plan
Flexible PTO and company holidays
Fully remote work within the United States
Periodic company offsites and team gatherings

Wizard is committed to fair, transparent, and competitive compensation practices.

Similar Jobs

Claritev

Scientist

14 Days Ago

In-Office or Remote

120K-150K Annually

Senior level

120K-150K Annually

Senior level

Healthtech • Software

The Senior Applied AI Scientist will lead AI and machine learning solutions in healthcare, improving workflows, decision-making, and reducing costs through innovative research and deployment of predictive models.

Top Skills: AutogenCrewaiLangchainLanggraphN8NPythonPyTorchTensorFlow

NobleAI

Scientist

17 Days Ago

In-Office or Remote

190K-220K Annually

Senior level

190K-220K Annually

Senior level

Artificial Intelligence • Software

Responsible for developing and deploying AI/ML models to solve scientific and industrial challenges, collaborating with experts to drive business impact.

Top Skills: AWSAzureGCPKerasPythonPyTorchScikit-LearnTensorFlow

Vantor

Scientist

24 Days Ago

In-Office or Remote

146K-269K Annually

Senior level

146K-269K Annually

Senior level

Aerospace • Artificial Intelligence • Computer Vision • Software • Analytics • Defense • Big Data Analytics

Design, develop, and deploy AI applications, build ML pipelines, optimize models, and collaborate with teams to transform geospatial data into actionable insights.

Top Skills: Ai-Driven ApplicationsAi/Ml PipelinesGoogle Cloud PlatformJaxMultimodal Ai SystemsPythonPyTorchReasoning ModelsTensorFlowVision-Language Models

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory