Stratus Logo

Stratus

Senior Data Architect (Hands on)

Posted 5 Days Ago
Remote
Hiring Remotely in United States
Senior level
Remote
Hiring Remotely in United States
Senior level
Own and implement the canonical data model and governance for a multi-tenant MEP SaaS platform. Architect data for AI/ML readiness (RAG, embeddings, vector search), design polyglot persistence and lake/lakehouse pipelines, lead staged modernization and migrations, produce hands-on prototypes and in-repo guardrails, and partner with platform, DB engineering, and ML teams to ensure data quality, lineage, and observability.
The summary above was generated by AI

Stratus, deriving from the Latin term meaning 'layer', offers an advanced set of MEP specific solutions that seamlessly layer across a contractor's entire workflow from design to fabrication to installation. Our team of seasoned industry experts, skilled technology leaders, innovators, and entrepreneurs understands that fabrication does not occur in isolation, and increasingly, it may not happen within your own fabrication shop. Through close relationships with our customers—who include some of the most innovative and largest MEP contractors—we have developed a suite of Stratus tools to digitize, automate, and optimize piping, plumbing, sheet metal, and electrical contracting. Stratus provides the software layer an MEP Contractor needs to optimize profits with true "Data Driven Contracting."

GENERAL DESCRIPTION

The Senior Data Architect owns our canonical data architecture — the schema, contracts, tenancy, and governance that every product and every AI/ML workload builds on. You are the single owner of the canonical data model: one normalized definition of the core business objects shared across our products, and the standard the rest of engineering builds against. This is a foundational, hands-on role — you design, prototype, and ship reference implementations and in-repo guardrails, not just diagrams.

Our approach to AI is to build durable, domain-specific data assets rather than commodity model infrastructure: we don't pretrain foundation models and we don't ship thin wrappers around someone else's. The differentiated value lives in how our data is modeled, governed, and made trustworthy for AI — and that is the layer you own.

KEY RESPONSIBILITIESAI/ML readiness
  • Architect the data layer so AI/ML workloads — vector search, embeddings pipelines, RAG-grounded retrieval, model training — run on a clean, governed substrate.
  • Make production data AI-ready: well-modeled, contract-enforced, lineage-tracked, and drift-detectable.
  • Design the data-side integration patterns these workloads depend on, such as feature-store and vector-store patterns across document, relational, and embedding data.
Data architecture
  • Own the canonical data model — the normalized definition of the core business objects shared across our products — and decide what is canonical versus tenant-specific.
  • Establish data architecture standards, data contracts, and schema discipline the rest of engineering builds against, enforced in-repo.
  • Exercise strong polyglot-persistence judgment: what belongs in document vs. relational vs. vector stores, and how to migrate between them without big-bang rewrites.
  • Define the multi-tenant data architecture: tenancy isolation, data residency posture, and per-tenant cost attribution across storage and compute.
Modernization
  • Lead staged modernization toward the right mix of stores and patterns for transactional, analytical, and AI/ML use cases — improving scalability, governance, and usability while minimizing disruption.
  • Own the architectural direction of the data pipeline and lake / lakehouse layer: ingestion, transformation, orchestration, and storage tiers.
  • Lead the move from homegrown pipelines to proven, industry-standard platforms, balancing build-vs-buy and total cost of ownership.
  • Modernize legacy data-access patterns via incremental, strangler-fig migrations that keep production stable.
Technical leadership
  • Drive hands-on prototypes, reference implementations, and in-repo guardrails.
  • Define the data, storage, and retrieval patterns the rest of engineering builds against.
  • Establish data quality, testing, lineage, and observability standards for pipelines and AI/ML serving.
  • Mentor engineers on schema discipline, modern data practices, and AI/ML-readiness patterns.
  • Make canonical decisions that are time-boxed, written, and defensible; hold disagree-and-commit rather than letting schema debate become a standing committee.
  • Use AI-assisted development tools (Claude Code, Copilot, Cursor) as a force multiplier for schema design, query tuning, and migration scripting.
Cross-team partnership
  • Partner with database engineering on production data health while owning long-term architectural direction.
  • Partner with ML and application engineering on their data needs — structuring and governing data so it is retrieval-ready and safe to build on.
  • Partner with platform / infrastructure on reliability, disaster recovery, residency, and the multi-tenant operational posture.
QUALIFICATIONS
  • 8+ years in data architecture, data engineering, database administration, or analytics engineering, with 3+ years in senior / lead roles.
  • Demonstrated ownership of a canonical or enterprise data model / cross-product schema — the model and contracts other teams built against.
  • Hands-on MongoDB at production scale (Atlas M40+ ideal): document modeling, aggregation framework, indexing, change streams, sharding, replica sets — and the judgment to recognize the Mongo-as-RDBMS anti-pattern.
  • Strong polyglot-persistence judgment: deciding what belongs in documents vs. relational vs. a vector store, and migrating between them incrementally.
  • Hands-on relational depth: schema design, indexing strategy, and query tuning, plus familiarity with vector search (Atlas Vector Search, pgvector, or equivalent).
  • Production experience making data AI/ML-ready: data architecture supporting RAG, semantic search, embeddings / vector pipelines, or agentic workloads.
  • Multi-tenant architecture experience: data residency and per-tenant cost attribution.
  • Pipeline / ELT / lake / lakehouse design at scale, with incremental migration strategies that minimize disruption.
  • Cloud-native data services (Azure, AWS, or GCP).
  • Strong grasp of data quality, testing, lineage, and monitoring — including observability for pipelines and AI/ML serving.
  • Comfortable modeling a complex, specialized domain. MEP / AEC / construction experience is a plus; appetite to learn the domain is required.
NICE TO HAVE
  • Knowledge-graph, ontology, or semantic-layer experience.
  • CDC and cross-engine sync (MongoDB Change Streams, Debezium, or equivalent).
  • Lakehouse platforms (Databricks, Snowflake, or open table formats — Iceberg, Delta, Hudi) and feature stores (Feast or equivalent).
  • Data governance for AI/agent access to production data: query-cost controls, read-path safety, lineage, and audit for higher-risk use cases.
  • SOC 2 and data-classification experience.
  • Azure data ecosystem (Data Factory, Synapse, Functions, Event Grid).
  • MongoDB certification (Associate DBA / Developer or higher) or substantive MongoDB University coursework.
WHAT SUCCESS LOOKS LIKE — FIRST YEAR
  • The canonical data model is owned and enforced: teams build against stable, documented contracts instead of bespoke forks.
  • Workloads sit in the right stores, legacy anti-patterns are receding, and reliability targets are holding.
  • Tenancy is formalized and per-tenant cost attribution is instrumented, so cost and capacity are observable as we scale.
  • The data substrate is AI-ready — model, contracts, and lineage in place — so AI/ML work builds on a solid foundation rather than waiting on data.
  • You've done it in partnership: the data tier is healthier, and engineers build against your contracts.
BENEFITS
  • Comprehensive and competitive health benefits plan
  • Matching 401k contributions
  • 20 days annual PTO
  • Primarily remote work with occasional annual team onsites


This is a fully remote position open to candidates based in the United States.

Similar Jobs

31 Minutes Ago
Easy Apply
Remote
United States
Easy Apply
40-65 Hourly
Mid level
40-65 Hourly
Mid level
AdTech • Artificial Intelligence • Cloud • Digital Media • Marketing Tech • Analytics • Consulting
Manage end-to-end programmatic campaigns across Display, OLV, CTV, DOOH, Audio, and Native. Execute campaign setup, trafficking, QA, optimization, and reporting using DSPs (DV360, Amazon DSP), analytics tools, and attribution models. Negotiate PMP deals, manage audiences and data segments, collaborate cross-functionally, and deliver data-driven insights to maximize client ROI.
Top Skills: Amazon DspDv360Google AnalyticsGoogle Tag ManagerLooker StudioMarketing Cloud IntelligenceThe Trade Desk
39 Minutes Ago
Remote
United States
137K-185K Annually
Senior level
137K-185K Annually
Senior level
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Provide high-touch executive support to a C-level leader: manage complex calendaring, travel, events, expenses, and special projects; liaise with internal/external stakeholders; partner with admin community to set standards and coverage for the leadership team.
Top Skills: GmailGoogle CalendarGoogle Docs
51 Minutes Ago
Easy Apply
Remote
USA
Easy Apply
131K-154K Annually
Senior level
131K-154K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Run day-to-day billing operations for Custody and Prime, ensuring invoicing accuracy, monthly close, collections, and billing ticket management. Lead and develop a billing operations team, coordinate BPO support, drive automation (including AI-assisted workflows), partner with Revenue Accounting, FP&A, Engineering, and execute/maintain billing-related SOX controls to scale institutional revenue recognition.
Top Skills: Generative AiNetSuitePrime PlatformSalesforce

What you need to know about the Chicago Tech Scene

With vibrant neighborhoods, great food and more affordable housing than either coast, Chicago might be the most liveable major tech hub. It is the birthplace of modern commodities and futures trading, a national hub for logistics and commerce, and home to the American Medical Association and the American Bar Association. This diverse blend of industry influences has helped Chicago emerge as a major player in verticals like fintech, biotechnology, legal tech, e-commerce and logistics technology. It’s also a major hiring center for tech companies on both coasts.

Key Facts About Chicago Tech

  • Number of Tech Workers: 245,800; 5.2% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: McDonald’s, John Deere, Boeing, Morningstar
  • Key Industries: Artificial intelligence, biotechnology, fintech, software, logistics technology
  • Funding Landscape: $2.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Pritzker Group Venture Capital, Arch Venture Partners, MATH Venture Partners, Jump Capital, Hyde Park Venture Partners
  • Research Centers and Universities: Northwestern University, University of Chicago, University of Illinois Urbana-Champaign, Illinois Institute of Technology, Argonne National Laboratory, Fermi National Accelerator Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account