Data engineers are well-known for their proficiencies in SQL, AWS, building big data pipelines and a number of other qualitative and quantitative tasks.
But what about their communication and time management skills?
While tech skills are vital to the role, SQL Developer Mark Dai said that soft skills — like the willingness to regularly collaborate with other departments to discuss a product roadmap — are just as important for success in his work at SPINS. The company provides brands and retailers in the natural, organic and specialty products industry with data and analytics to improve their performance and market presence.
Dai said SPINS’s data engineering team must understand the full scope of the data they handle because it directly impacts customers. In order to release data accurately and on time, he and his team members must be good communicators and adept at reprioritizing duties based on changes that arise.
Below, Dai shared more of the technical and non-technical aspects of his day-to-day work and the tools that enable his success.
What does a typical day look like for you?
During a scheduled data release week, a typical day involves executing, monitoring and troubleshooting workflows in our on-premises and cloud environment. There is also excitement around our day-to-day projects outside of data release week, as we constantly look for ways to enhance our current workflows. For instance, we problem-solve; identify operating procedures; create new automated quality assurance checks; and keep up with new functionality released for our cloud computing.
As a data operation team member, it is important to communicate with a wide range of stakeholders. We need to know about upcoming features and changes to our product library, then evaluate whether there’s anything that needs to be altered in the operation procedure. We also need to understand common questions or requests received by customer success so we can prioritize our work.
I mainly use Python and SQL for projects. We also need a strong understanding of various Google Cloud Platform tools and workflow processing engines like Airflow and Azkaban. Occasionally, we need to perform troubleshooting with MySQL, Hadoop Distributed File System and Hadoop logs.
As a data ops team member, it’s important to communicate with a wide range of stakeholders.”
Tell us about a project you’re working on right now that you’re really excited about. What about this project specifically do you find rewarding or challenging?
We are always excited about adding new technologies to our workflow that simplify our daily operations and improve efficiency. One of the current projects I am very excited about is migrating our scripts from our on-premises server to Airflow. This new system will be more efficient than repeatedly deploying code, running the environment and scheduling jobs with cron. Airflow also enhances collaboration, making it easier to trigger processes continuously and share the tools with others.
What’s the most important skill a data engineer needs to be successful in their role?
Although many skills are needed for this role, the most important ones are staying focused and task management under pressure. The bottom line is that data will go directly to customers after our process, so we must make sure that we have good data quality in a limited amount of time. When operation procedures do not go as scheduled, we may need to execute some things out of order to ensure high-priority needs are met.