The Ghost in the Machine: How to Remove Bias in AI

The United States has the world’s highest rate of incarceration, and a disproportionate number of those incarcerated are Black. To help remove human bias and increase accuracy in sentencing, a machine learning algorithm was created by assessing 137 factors from each defendant’s past.

As reported by ProPublica, the machine learning system in question then produced risk scores that estimated how likely each defendant was to reoffend — and those risk scores were presented to judges to help inform everything from criminal sentencing to parole.

There were two major problems, according to ProPublica’s reporters. First, the machine learning system incorrectly flagged Black defendants as future criminals at twice the rate of white defendants. Secondly, white defendants were misclassified as low-risk 63.2 percent more often than Black defendants.

While the controversy and confusion over this case continues, it is important to note that race was not used as a factor in creating the risk scores. Yet as The Washington Post pointed out, a machine learning system that predicts rearrests in America might need to take race into account rather than ignore it.

Predominantly Black neighborhoods are subject to heavier policing, and a recent study showed that Black people are five times more likely to be arrested than white people. Thus, a source for potential bias in this system and others may rest in training data that does not incorporate racial injustice into its model.

What’s the solution to eliminating bias in machine learning? We sat down with two Chicago tech companies to discuss the strategies they use to remove bias and what models are used as unbiased replacements.

How Imperfect Training Data Leads to Accidental Bias

As reported by The New York Times, a recent facial recognition program using machine learning had more trouble recognizing women and people of color than it did white men.
Women were misidentified as men 19 percent of the time, “darker-skinned” women were misidentified as men 31 percent of the time and “lighter-skinned” men experienced no error.
The reason behind the bias was likely the training data, which probably used more images of lighter-skinned men in its original dataset than others.

Joseph Davin

Innovation Fellow and Head of Data Science and Technology • West Monroe

At West Monroe, a national consulting platform, Innovation Fellow and Head of Data Science and Technology Joseph Davin has zeroed in on the fact that it isn’t algorithms that contain bias — it’s the training data. By focusing on the original data set with a high degree of care, incidents or tendencies that might create bias can be removed and revamped.

What’s the most important thing to consider when choosing the right learning model for a given problem? And how does this help you get ahead of bias early on in the process?

The most important thing to consider when selecting the right learning model is to ensure the training data is unbiased to begin with. Algorithms do not contain bias. Data originates from the real world, and therefore naturally contains various elements of bias. For example, home appraisal studies have found that racial bias leads to lower appraised values for Black homebuyers. If home appraisal data were fed into an algorithm at face value, the algorithm might pick up and embed the bias into a predictive model. While the model itself started agnostic to bias, it can learn bias if it is present in the original training dataset.

To combat this, we need to carefully examine the original dataset. This requires a high degree of care and knowledge of the subject matter. This is where the social sciences and liberal arts can help inform pure math and computer science to avoid perpetuating bias within machines.

The social sciences and liberal arts can help inform pure math and computer science to avoid perpetuating bias within machines.”

What steps do you take to ensure your training data set is diverse and representative of different groups?

Training data should reflect the diversity within the target population, otherwise a lack of representation could bias your model. For example, if your target is people who live within 20 miles of a big-box store, then a sample cannot be drawn from the entire U.S. population, because this would include people who live beyond 20 miles from a big-box store and therefore outside of the target group. The training sample should be specific to the target use case. Conversely, a training dataset that is representative can still contain bias if the bias is filtered into the data through human decision-making.

We can all take steps to ensure data is representative by ensuring careful exploratory data analysis. Additionally, we can incorporate model checking for model bias along common demographic dimensions like age, gender and ethnic background.

When it comes to testing and monitoring your model over time, what’s a strategy you’ve found to be particularly useful for identifying and eliminating bias?

To be honest, there is no one solution that will solve bias in all situations. The approach I have found to be particularly useful is to be intentional about diversity throughout the entire process from start to finish. This includes building diverse teams, adding diversity checks in user acceptance testing and incorporating automatic checks for training data biases. Having the right blend between technology and diverse human experience is key to a comprehensive strategy.

Zachary Ernst

Machine Learning Lead • Clearcover

Zachry Ernst is machine learning lead at Clearcover, a platform for smarter auto insurance. Ernst believes that one of the keys to unbiased machine learning is creating models that have as few interactions as possible, while also training separate versions of models across different geographies.

What’s the most important thing to consider when choosing the right learning model for a given problem? And how does this help you get ahead of bias early on in the process?

For us, explainability is a default requirement for all projects and we stay away from “black box” techniques that are too opaque. Furthermore, we try to generate useful models that include as few interactions among our features as possible. For example, a linear regression is better than a tree-based method, and shallow trees are better than deep ones, all else being equal. Sticking to this process allows us to more confidently interpret the models because they do not have an unintelligible number of complex interactions. We routinely give up a certain amount of accuracy in favor of simplicity and explainability.

What steps do you take to ensure your training data set is diverse and representative of different groups?

In our business, bias is likely to show up when certain geographic regions are overrepresented. Because protected categories tend to cluster geographically, this can lead to bias. We train several different versions of our models across different geographies and at varying levels of granularity. This process has suggested ways in which age and race, for example, have influenced the behavior of our models.

We also generate documentation on the features that are used in the models and their relative importance. This documentation is shared across technical and non-technical teams, including product managers, stakeholders and our legal team.

We train several different versions of our models across different geographies and at varying levels of granularity.”

When it comes to testing and monitoring your model over time, what’s a strategy you’ve found to be particularly useful for identifying and eliminating bias?

We are still early in this process, so I expect our strategy to expand. Our team has dashboards in place that show how feature distributions have changed over time, and we also monitor whether the model outputs are systematically different for different geographical regions. Because geographies correlate with demographic variables, we believe that this will help us identify bias that arises over time in our models.

The Ghost in the Machine: How to Eliminate Bias in AI

How Imperfect Training Data Leads to Accidental Bias

Recent Articles