We all depend on healthcare throughout our lifetimes, for ourselves, and our families and friends, but it is notoriously difficult to navigate and understand. As an industry that comprises 20% of the US economy we think healthcare should work better for all of us. At Collective Health we believe it’s time for a new day in healthcare where as members we are informed and empowered to make the right care choices when the decisions are urgent and critical.
Data Infrastructure Site Reliability Engineering at Collective Health is a discipline combining software and systems engineering skills. We apply modern data platform infrastructure, systems, software, architecture, and development practices to give our customers a more reliable, scalable and secure healthcare data management experience.
Partnering with engineering teams, Data Infrastructure Site Reliability Engineers build on public cloud services to deliver a comprehensive platform that enables our developers to rapidly deliver high-quality, impactful, scalable, and reliable services. As a broader group of Site Reliability Engineers including those focused on infrastructure and those embedded in other engineering teams, we collaborate and identify themes and solutions to benefit Collective Health at large, engage in regular knowledge sharing activities and retrospectives, and relentlessly support one another in order to gain knowledge, remove barriers, and grow as individuals and a team.
Together, we’re building the next generation healthcare platform, and proud to be on the leading edge of this important mission.Responsibilities
On any given day you may need to...
- Collaborate on and/or lead engineering efforts from requirements to production, solving problems of developer productivity and presenting complex technical concepts to the data platform team and more broadly the engineering org, and leadership audiences.
- Write code that is well-tested, easily understood, and maintainable by others.
- Troubleshoot and fix complex production issues related to availability or performance, even if they are outside your comfort zone.
- Work independently and autonomously.
- Deeply integrate into the Data Platform team, collaborate with other SRE members and the Data Platform engineering team
- Advise, critique, or comment on engineering designs.
- Help our internal customers solve their problems in as efficient and future-proof a manner as possible.
Imposter syndrome is real. If you are hesitant to apply because of not checking all the boxes, or you’ve had a less-traditional pathway into Site Reliability Engineering, we encourage you to still apply and mention why you’re interested in the role.Minimum
- 5+ years of work experience in DevOps, Site Reliability Engineering, or Software Engineering.
- Experience in supporting in-house & customer-facing production systems and responding to incidents.
- Experience in supporting and query optimization for the big data data platforms such as Presto, Databricks SQL or related
- Knowledge of data structures, algorithms, distributed systems, and information retrieval.
- Experience with at least one of the following or similar technologies, including: Kubernetes, Docker, Postgres, CI/CD, etcd, Elasticsearch, or related scheduling and persistence services. Apache Kafka, or related eventing systems.
- Experience in at least one of the following areas of software development: refactoring code, test-driven development, build infrastructure, debugging, building tools and testing frameworks.
- Experience with data infrastructure including at least one of: AWS RDS, Snowflake, AWS data lake, AWS tools for data (Redshift/Athena, etc), data pipelines and DAGs, workflow engines, Databricks, Looker, Airflow, data dictionaries, data governance
- Knowledge of data security including: RBAC, AWS security infrastructure (IAM)
- Understanding of networking concepts such as routing, firewalls, load balancers, and secure communication -- especially in the context of cloud infrastructure.
- Methodical problem-solving approach, coupled with strong communication skills and an ability to own and drive projects to completion.
- Experience with the query optimization an
- 5+ years of work experience in DevOps, Site Reliability Engineering, Data Infrastructure or Software Engineering, or an advanced degree in Computer Science or related technical field.
- Good understanding of private and public cloud design considerations and limitations in the areas of infrastructure, distributed systems, data storage, Linux-based operating systems, and security.
Founded in 2013, Collective Health has created an ecosystem of innovative partners across care and benefits delivery, as well as built a powerful and flexible infrastructure to better enable employees and their families to understand, navigate, and pay for healthcare. By reducing the administrative lift of delivering health benefits, providing an intuitive member experience, and improving health outcomes, the company guides employees toward healthier lives and companies toward healthier bottom lines. Collective Health is headquartered in San Mateo, CA with locations in Chicago, IL, and Lehi, UT. For more information, please visit collectivehealth.com.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Collective Health is committed to providing support to candidates who require reasonable accommodation during the interview process. If you need assistance, please contact [email protected]