Robot picking up employee

AI resume screening is putting up walls for minorities in early screening stages. (fotogestoeber/Shutterstock)

In a nutshell

  • AI resume screening tools showed strong racial and gender bias, with White-associated names preferred in 85.1% of tests and Black male names favored in 0% of comparisons against White males.
  • Bias increased when resumes were shorter, suggesting that when there’s less information, demographic signals like names carry even more weight.
  • Removing names isn’t enough to fix the problem, as subtle clues—like word choice or school name—can still reveal identity, allowing AI systems to continue filtering out diverse candidates.

SEATTLE — Every day, millions of Americans send their resumes into what feels like a digital black hole, wondering why they never hear back. Artificial intelligence is supposed to be the great equalizer when it comes to eliminating hiring bias. However, researchers from the University of Washington analyzing AI-powered resume screening found that having a Black-sounding name could torpedo your chances before you even make it to the interview stage.

A study presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in October 2024 revealed just how deep this digital discrimination runs. The researchers tested three state-of-the-art AI models on over 500 resumes and job descriptions across nine different occupations. They found that resumes with White-associated names were preferred in a staggering 85.1% of cases, while those with female-associated names received preference in just 11.1% of tests.

The study found that Black male job seekers face the steepest disadvantage of all. In comparisons with every other demographic group—White men, White women, and Black women—resumes with Black male names were favored in exactly 0% of cases against White male names and only 14.8% against Black female names.

These aren’t obscure academic models gathering dust on university servers. The three systems tested—E5-mistral-7b-instruct, GritLM-7B, and SFR-Embedding-Mistral—were among the highest-performing open-source AI tools available for text analysis at the time of the study. Companies are already using similar technology to sift through the millions of resumes they receive annually, making this research particularly urgent for working Americans.

How the Bias Shows Up

AI looking at job search on a computer
AI is doing the opposite of eliminating bias in hiring processes. (Summit Art Creations/Shutterstock)

These AI resume screening models convert resumes and job descriptions into numerical representations, then measure how closely they match using something called “cosine similarity,” essentially scoring how well a resume aligns with what the job posting is looking for.

Researchers augmented real resumes with 120 carefully selected names that linguistic studies have shown are strongly associated with specific racial and gender groups. Names like Kenya and Latisha for Black women, Jackson and Demetrius for Black men, May and Kristine for White women, and John and Spencer for White men.

When they ran more than three million comparisons between these name-augmented resumes and job descriptions, clear patterns emerged. White-associated names consistently scored higher similarity ratings, meaning they would be more likely to make it past initial AI screening to reach human recruiters.

Intersectional analysis, looking at how race and gender combine, revealed even more drastic disparities. Black men faced discrimination across virtually every occupation tested, from marketing managers to engineers to teachers. Meanwhile, the smallest gaps appeared between White men and White women, suggesting that racial bias often outweighs gender bias in these AI systems.

Critics might argue that removing names from resumes could solve this problem, but it’s not that simple. Real resumes contain numerous other signals of demographic identity, from university names and locations to word choices and even leadership roles in identity-based organizations.

Previous research has shown that women tend to use words like “cared” or “volunteered” more frequently in resumes, while men more often use terms like “repaired” or “competed.” AI systems can pick up on these subtle linguistic patterns, potentially perpetuating bias even without explicit demographic markers.

When researchers tested “title-only” resumes, containing just a name and job title, bias actually increased compared to full-length resumes. This suggests that in early-stage screening, where less information is available, demographic signals carry disproportionate weight.

An AI robot hiring manager shaking hands with a candidate
Would you rather experience discrimination from AI or a real hiring manager? (Andrey_Popov/Shutterstock)

AI-powered resume screening is rapidly becoming the norm. According to industry estimates, 99% of Fortune 500 companies already use some form of AI assistance in hiring decisions. For job seekers in competitive markets, this means that algorithmic bias could determine whether their application ever reaches human eyes.

“The use of AI tools for hiring procedures is already widespread, and it’s proliferating faster than we can regulate it,” says lead author Kyra Wilson from the University of Washington, in a statement.

Unlike intentional discrimination by human recruiters, algorithmic bias operates at scale and often invisibly. A biased human might discriminate against a few candidates, but a biased AI system processes thousands of applications with the same skewed logic, amplifying its impact exponentially.

Can we fix AI bias in hiring?

Some companies are experimenting with bias mitigation techniques, such as removing demographic signals from resumes or adjusting algorithms to ensure more equitable outcomes. However, these approaches often face technical challenges and may not address the root causes of bias embedded in training data.

“Now that generative AI systems are widely available, almost anyone can use these models for critical tasks that affect their own and other people’s lives, such as hiring,” says study author Aylin Caliskan from the University of Washington. “Small companies could attempt to use these systems to make their hiring processes more efficient, for example, but it comes with great risks. The public needs to understand that these systems are biased.”

Current legal frameworks struggle to keep pace with algorithmic decision-making, leaving both job seekers and employers in uncharted territory. The researchers call for comprehensive auditing of resume screening systems, whether proprietary or open-source, arguing that transparency about how these systems work—and how they fail—is essential for identifying and addressing bias.

Of course, it’s important to remember that this research was presented in October 2024. While it’s still relatively new, LLMs are being updated quite often. Current versions of the systems tested may yield different results if they’ve since been updated.

In trying to remove human prejudice from hiring, we’ve accidentally created something worse: prejudice at machine speed. We’re letting AI make decisions about people’s livelihoods without adequate oversight. Until we acknowledge that algorithms inherit human prejudices, millions of qualified workers will keep losing out to systems that judge them by their names, not their abilities.

Paper Summary

Methodology

The researchers conducted an extensive audit of AI bias in resume screening using a document retrieval framework. They tested three high-performing Massive Text Embedding (MTE) models on 554 real resumes and 571 job descriptions spanning nine occupations. To measure bias, they augmented resumes with 120 carefully selected names associated with Black males, Black females, White males, and White females based on previous linguistic research. Using over three million comparisons, they calculated cosine similarity scores between resumes and job descriptions, then used statistical tests to determine if certain demographic groups were consistently favored. They also tested how factors like name frequency and resume length affected bias outcomes.

Results

The study found significant bias across all three AI models. White-associated names were preferred in 85.1% of tests, while Black names were favored in only 8.6% of cases. Male names were preferred over female names in 51.9% of tests, compared to female preference in just 11.1%. Intersectional analysis revealed Black males faced the greatest disadvantage, being preferred over White males in 0% of comparisons. The researchers validated three hypotheses about intersectionality and found that shorter resumes and varying name frequencies significantly impacted bias measurements.

Limitations

The study relied on publicly available resume datasets that may not perfectly represent real-world job applications. Resumes were truncated for computational feasibility, potentially affecting results. The researchers used an external tool for occupation classification, which may be less accurate than manual coding. The study focused only on two racial groups (Black and White) and binary gender categories, limiting insights about other demographic groups. Additionally, the models tested were open-source versions that may differ from proprietary systems actually used by companies.

Funding and Disclosures

This research was supported by the U.S. National Institute of Standards and Technology (NIST) Grant 60NANB23D194. The authors note that the opinions and findings expressed are their own and do not necessarily reflect those of NIST. No competing interests or additional funding sources were disclosed in the paper.

Publication Information

This research was conducted by Kyra Wilson and Aylin Caliskan from the University of Washington in 2024. The paper “Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval” was presented in the Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES 2024), 1578-1590. Association for the Advancement of Artificial Intelligence.

About StudyFinds Analysis

Called "brilliant," "fantastic," and "spot on" by scientists and researchers, our acclaimed StudyFinds Analysis articles are created using an exclusive AI-based model with complete human oversight by the StudyFinds Editorial Team. For these articles, we use an unparalleled LLM process across multiple systems to analyze entire journal papers, extract data, and create accurate, accessible content. Our writing and editing team proofreads and polishes each and every article before publishing. With recent studies showing that artificial intelligence can interpret scientific research as well as (or even better) than field experts and specialists, StudyFinds was among the earliest to adopt and test this technology before approving its widespread use on our site. We stand by our practice and continuously update our processes to ensure the very highest level of accuracy. Read our AI Policy (link below) for more information.

Our Editorial Process

StudyFinds publishes digestible, agenda-free, transparent research summaries that are intended to inform the reader as well as stir civil, educated debate. We do not agree nor disagree with any of the studies we post, rather, we encourage our readers to debate the veracity of the findings themselves. All articles published on StudyFinds are vetted by our editors prior to publication and include links back to the source or corresponding journal article, if possible.

Our Editorial Team

Steve Fink

Editor-in-Chief

John Anderer

Associate Editor

Leave a Reply

2 Comments

  1. fsilber says:

    How were the AI programs trained or programmed? How did it learn to select resumes?

  2. John Logan II says:

    The penchant for ignoring data instead of understanding it is the real human flaw.