Principal Data Engineer
DRIVIN is looking to expand our data team as we continue to grow our data platform. The candidate should have a strong background with Python and SQL. As a member of the data team, you will architect, plan, build, test, and deliver data solutions utilizing our AWS enterprise data platform.
DRIVIN has a polyglot data model using many cutting-edge data platforms including AWS Redshift for Data Warehouse, Elastic Search for location-based searching, and Postgres for transactional data and product delivery. Our delivery framework is comprised of Python/Docker on ECS, Spark on EMR, and Jenkins for CI/CD.
This candidate should be a self-starter who is interested in learning new systems/environments and passionate about developing quality supportable data service solutions for internal and external customers.
What you’ll do:
As a Principal Data Engineer, you’ll be responsible for the architecture of a data platform supporting multiple lines of business through internal and external delivery teams. A Principal Engineer understands frameworks and technologies are part of a toolset in order to determine what the best solutions are for any given task. You will enable and work with our other developers to define and lead the technologies in the fields of data ingestion and mapping supporting data science, product delivery, master data management, and Data As A Service stakeholders. Principal Data Engineers are empowered to evaluate our current system and potential new technologies —supporting the long-term stability, scalability, and performant cloud data delivery platform.
As a Principal Engineer, your responsibilities may include, but are not limited to, the following:
- Take ownership including design and strategic roadmap for the data engineering team delivery framework. This includes organizing and maintaining core delivery codebase, naming conventions, and SDLC processes used by the data engineering team
- Lead architectural evaluation, code review, and grooming sessions for team members ensuring timely and high-quality delivery of data services and projects on top of the data platform.
- Take ownership of an enterprise cloud-based data platform and it’s architecture to ensure that internal and external customer data needs are met with timeliness, quality, and availability.
- Work with our DevOps team to ensure that our platform is built and deployed using CI/CD best practices. Define requirements and feature needs back to DevOps teams for the effective support of the data engineering team delivery framework.
- Own strategic business plans related to our data platform’s feature set and operational success including data migrations, infrastructure upgrades, tech-debt, or new source ingestion.
- Bachelor's degree in Computer Science or related field
- 8+ years of data warehouse / ETL systems building experience
- 5+ years of experience developing data pipelines using SQL, Python,
- A passion for operational needs of data pipelines, ensuring that data engineering approaches support long-term timeliness, quality, and availability of data flows.
- A passion for data quality assurance testing. Understanding that data quality is paramount to the success of products and it’s customers.
- A passion for creation of non-recurring engineering development. Identifying opportunities for common code-basis and working with the data engineering team to implement them as standard.
- Highly proficient in programming, troubleshooting, and problem solving related to building of enterprise data solutions.
- Excellent communication skills and ability to work using Agile methodologies. Enthusiasm for coaching other software engineers and participation or leadership of developer communities of practice.
- Ability to work collaboratively with software engineers, project management, product owners, stakeholders, and leadership.
- Experience with service-oriented (SOA), event-driven (EDA) architectures, cloud-based DWBI architectures, and ETL/RDBMS frameworks.
We value these qualities, but they’re not required for this role:
- Open source contribution
- Strong understanding and/or experience with:
- AWS Redshift, Aurora, Postgres
- Python / Docker ETL framework on EC2
- CI/CD practices and frameworks such as Jenkins, AWS Terraform, and Artifactory.
- Big data technologies on cloud such as Spark and EMR.
- Java application development especially surrounding data services.