Assoc Data Scientist
We are looking for amazing people diverse in thought, perspective, and culture to join our team. We check our egos at the door, roll up our sleeves, work hard, move fast, and support each other. If that sounds like fun to you, please apply!
Centro’s technology focuses on improving and streamlining digital media logistics for online advertising. We are here to ensure that the advertising industry and the people in it are healthy and engaging positively and effectively with those around them. We’re here, ultimately, to improve the lives of people working in the media industry. And, we take our responsibility seriously.
ABOUT THE TEAM
The Data Science team at Centro focuses on extracting actionable insights from data and building machine learning algorithms using large amounts of data. We aim to derive meaning from our data enabling us to run our business better and equip our clients to advertise smarter. As part of our Data Science team, you will be working with product managers,engineers, and business stakeholders to bridge the gap between raw data and making informed decisions.
ABOUT THE ROLE
We are looking for a Data Scientist who has experience in analyzing large data sets in distributed environments such as Amazon EMR. The ideal candidate will have a passion for discovering patterns from data and working with internal stakeholders to understand business problems. The candidate will have experience working with distributed data processing tools and infrastructure including MapReduce, Hadoop, Hive, and Spark.
CORE RESPONSIBILITIES
• Inquisitively interview business stakeholders and understand business problems before formulating solutions.
• Identify disparate data sources relevant to solve a given problem. Understand the business logic and nuances of the data.
• Possess natural curiosity to explore the data. Write complex SQL and Hive queries to understand the relationships among data entities.
• Understand pros and cons of different machine learning tools (e.g. MLlib, scikit-learn, Amazon SageMaker) and recommend which one to use for a given problem.
• Understand behind-the-scenes steps of machine learning algorithms and do not treat them as black boxes. Know pros and cons of different machine learning practices. Select the most appropriate algorithm for a given task and explain why a particular algorithm is better than others to solve the problem.
• Analyze steps involved in training a machine learning algorithm and break down into steps that can process terabytes of data in a distributed environment. This will involve analyzing vast amounts of data, generating features that are relevant to the problem, and running Spark jobs to pre-process the data that can be used to train the learning algorithm.
• Apply statistical methods to analyze results from these steps. Fine-tune model parameters iteratively to improve model efficiency and accuracy.
• Compile final outputs of the algorithms and present to stakeholders in a way that is comprehensible by nontechnical audience.
• Create data visualizations to tell the story from data. Understand what types of visualizations are appropriate for the audience.
• Collaborate with Product Operations team to set up environments needed for the data science team; communicate effectively on what is needed and brainstorm with them to explore the best solution for a given problem.
• Collaborate with Product and Engineering teams in productizing proof-of-concept machine learning algorithms.
YOU ARE RIGHT FOR THE JOB IF:
• Your education background and relevant work experience demonstrate a mixture of math, statistics, and data engineering.
• You are able to code in Python and R.
• You have at least 2 years of experience working with machine learning algorithms in a distributed environment using large-scale data processing tools.
• You can extract data on your own and prepare a large data set to train a machine learning algorithm.
• You feel comfortable working with data that is sometimes incomplete, or messy. Or both.
• You don’t make assumptions about the data yourself. Instead, you are willing to work with others to get a clear understanding of the context around the data.
• You possess business acumen and curiosity to learn Centro’s business operations.
• You don’t limit yourself to how things are done today. You instead focus on the best ways to create value.
Bonus Item:
• You have experience working in Ad Tech particularly in Real-time bidding (RTB)
Centro is an Equal Opportunity Employer. We respect and support an inclusive workplace diverse in thought, perspective and culture. We celebrate all team members regardless of gender/identity, sexual orientation, race or cultural background, religion, physical disability and age. We are better together.