In the age of information, the only limit on the power of data is its user’s imagination. To that end, few companies are as imaginative as Groupon.
The company’s data warehouse team manages a network that spans 500-plus markets in 15 countries and powers everything Groupon does. As the digital marketplace continues to expand its offerings, the team is re-thinking how it works with its data, transitioning from on-premise storage centers to the cloud.
We spoke with three data warehouse engineers about how they’re moving Groupon’s data into the future.
EMPLOYEES: 6,000 global (more than 1,500 local)
WHAT THEY DO: Groupon is a marketplace for local services, experiences and goods that connects consumers with businesses in their neighborhoods.
WHERE THEY DO IT: Chicago
THEIR STACK: Processing petabytes of data requires having the right tools. To get the job done, Groupon turns to Hadoop frameworks including Spark, Sqoop, Pig and Hive, as well as a Teradata Cluster for fast analytics. They code in a mix of Java, Scala and Python.
ALL ABOARD: New engineers have access to a library of training materials and data documentation, as well as a mentor to help them acclimate to the team.
GROWTH OPPORTUNITIES AHEAD: Why Groupon empowers its employees to try on new roles. Learn more.
Derrick Spell, Senior Software Development Engineer
As Groupon’s data platform architect, Derrick is responsible for the platform that powers everything from streaming architecture to data warehousing, reporting and visualization. His job is to design those systems and make sure everything works together.
BEYOND WORK: Derrick finds inspiration through his local church, and expresses his creativity in the woodshop. Though he loves software design, woodworking enables him to bring his designs to life, where people can appreciate their utility and beauty.
You’ve been at Groupon for six years. What about the company has kept you there?
The curious are never bored with their work. There are plenty of interesting problems to solve in tech so it’s not the challenges that keep you at a company — it’s the people. If you work with talented, interesting people in a supportive atmosphere, work is going to be fun and rewarding. That’s what keeps people at a company, and that’s the environment I try to foster for my teams.
There are plenty of interesting problems to solve in tech so it’s not the challenges that keep you at a company — it’s the people.”
What project currently excites you the most?
We've got our heads in the clouds right now — a tired pun, I know — but we are planning a migration of our entire data platform to the cloud. There are a lot of architectural patterns and paradigms that are unrealistic when you are operating your own hardware. Moving to the cloud provides us with an opportunity to think about what we are doing in new ways. It simplifies the process of testing a new database or a computer cluster and makes it easier to move it into production.
What’s next for your team?
We're focusing on how the platform needs to evolve to provide data not only to business reports but also to the myriad machine learning models that drive innovation at Groupon. This requires unlocking new data interfaces and making it easier for data scientists to create new features and publish those on reliable pipelines.
Josh Friedlander, Manager of the Data Warehouse
Josh manages Groupon’s global team of developers who focus on building data pipelines that feed the data warehouse. He also leads Groupon’s AWS migration.
BEYOND WORK: Josh feeds his need for speed racing motorcycles, which, like managing a huge operation, requires the ability to balance risk and foresight.
What opportunities for innovation does the data warehouse team have?
We have the opportunity to break up our monolithic codebase and move it to a service-oriented architecture. There is a ton of opportunity from the product, development and management side when working on a project of this scale.
We have the opportunity to break up our monolithic codebase and move it to a service-oriented architecture.”
What project are you most excited about?
I'm running the AWS migration for our business unit. I authored the initial proof of concept and am helping to re-platform our legacy systems from a monolith to a service-oriented architecture. I love being able to mentor developers who have had to work for years within the constraints of a data center. When they realize the capabilities of AWS and have that “aha” moment, it’s amazing.
What’s the biggest challenge your team is solving?
How do you get real-time reporting from a 12-petabyte dataset? How do you store it in a way that makes financial sense, meets compliance and still allows people reporting on it to do their job? We are hoping that a robust data lifecycle strategy and metastore will solve this.
How would you describe your leadership style?
I take more of a hands-off approach. I prefer not to get involved unless I know for a fact it will cause serious issues. There is huge value in learning from your mistakes and struggling with a new concept. Hand-holding is appropriate in some circumstances, but I find it doesn't let junior developers grow.
Denise Swanson, Senior Manager, Data Governance
Denise leads the data governance team, which is responsible for data quality at Groupon. To ensure quality, her team focuses on its usability, integrity and security.
BEYOND WORK: Denise grew up with her hands in the dirt, raising corn, berries and carrots on her family farm. Much like writing software, gardening in the city has required constant tests and iterations, but she’s finally cracked the code, having harvested silver queen corn last year.
How has data evolved from when you first started in the industry?
I’ve always been on the data-side of technology. To give you some perspective, my first data warehouse project was only 46 gigabytes — and we thought that was big data. Today, I have spreadsheets with more data. That means we are no longer limited by data size, we are only limited by our imagination.
[...] We are no longer limited by data size, we are only limited by our imagination.”
How have data warehouse solutions evolved over that time?
I’ve been involved with data solutions for a long time. The more interesting observation is how much the data principles and challenges have remained the same. Data is still logically modeled via star schemas or snowflakes, and data transformation continues to be the greatest challenge. The technologies have changed, but the principles remain the same.
Does this team have a reputation within the company?
We get things done. Integrating data from 15 countries and 500-plus markets is challenging. Our partnerships with merchants only add to our data complexity. It has taken many iterations as Groupon continues to offer more services to both customers and merchants, but we get the data to the company decision-makers every day, at the start of their day, whenever that day may begin. Simply put, we deliver.