How Going All-In on Machine Learning Changed Data Collection at Morningstar

by Michael Hines
February 28, 2020
The Morningstar data collection team

Shariq Ahmad set an ambitious goal for Morningstar’s data collection team in 2019: to have at least 50 percent of its engineers working on machine learning initiatives by year’s end. 

Ahmad joined Morningstar, which provides research and proprietary tools to investors, in 2010 and stepped into the role of head of technology for the data collection group in the summer of 2018. His first order of business was to automate the data collection process which, up until that point, had relied on analysts to gather information from numerous sources — ranging from SEC filings to managed investment documents — and verify its quality.

“Collecting financial data from various sources is an exhaustive process,” Ahmad said. “With an ever-increasing demand for new datasets, I realized we needed some form of automation to help us scale.”

Morningstar’s data collection team has more than 100 developers spread across the globe. By the end of 2019, over half of them were working on machine learning full time. According to Ahmad, this shift has changed the rate at which the company gathers data — and sparked interest in new forms of automation that don’t require teams full of data scientists for implementation.

Shariq Ahmad
Head of Technology, Data Collection

At the beginning of 2019, you set a goal to retrain 50 percent of the data collection team’s engineers in machine learning. Your team hit that target with time to spare. What were the key factors that enabled so many developers to so quickly switch focuses?

We made difficult decisions around project prioritization. We slowed down and even halted some projects to make room for machine learning. We seeded our teams with data scientists and started to retrain a large portion of the team on newer skill sets. We’re very fortunate to have talented and motivated people on our team, which helped make this transformation a lot smoother.


In addition to retraining, what other challenges did your team face implementing machine learning?

Generating training data was a challenging task in itself because we cover so many datasets. We created additional tooling to capture contextual information that had previously gone uncollected. This contextual information allowed us to train machine learning models and deliver runtime inferences to an analyst who either accepts or rejects the output, forming a feedback loop for retraining that further improves the model. Machine learning is helping us collect more data faster and at a potentially higher quality than before.

Machine learning is helping us collect more data faster and at a potentially higher quality than before.”

Now that the data collection group has completed this first push into machine learning, what comes next?

Creating machine learning models requires a lot of data science knowledge as well as some programming. For example, to successfully predict an outcome to a problem, a data scientist must decide among a number of algorithms with each one having multiple hyperparameters to choose from. 

Creating a model can take months, and good data science talent is difficult to find. This is where the opportunity for automated machine learning, or AutoML, comes in. Several cloud providers such as Amazon and Microsoft, as well as open-source projects, deliver AutoML services that allow engineers to quickly launch a model in significantly less amount of time.


We’ve heard you set up an R&D team at Morningstar’s Chicago office. What can you tell us about it?

I wanted to make sure that we stay ahead of the curve, utilize any innovation in this space quickly and help pave the way for the rest of our teams across the globe. Our R&D squad is tasked with delivering repeatable patterns to help us scale data collection.


Your team has gone all-in on machine learning, and the benefits aren’t hard to see. But what about for investors? What impact does your investment in ML have on them?

Machine learning helps us collect data at a faster pace, which reduces the time an analyst spends sifting through a source. Analysts now help review the model’s output, which leads to higher-quality data for our end users.

Jobs at Morningstar

Chicago startup guides

Best Companies to Work for in Chicago
Coolest Offices in Chicago Tech
Best Perks at Chicago Tech Companies
Women in Chicago Tech