That Stanford Study on Algorithmic Hiring Is Being Misreported — Here's What It Actually Says

Clarence Bongalos
Jun 8
5 min read

This past weekend, because of my coaching and consulting work and my work on Emppowered, several people sent me videos about a new Stanford study on algorithmic hiring. All of them were making essentially the same claims: that most companies use the same ATS algorithm, that your first application generates a score, that the score is biased against you, and that it follows you from company to company, which is why you keep getting rejected.

So before I said anything publicly, I did something most of the people reporting on it seemingly did not: I read all of it. The full 32-page paper, the published summary, and a Q&A with the researchers.

What's spreading on social media is not what the study found.

What the Study Actually Studied

The research—conducted by scholars from Stanford, Chapman University, and Northeastern University—examined one vendor: pymetrics. Pymetrics is not an ATS. It's a behavioral assessment tool that has job applicants play a series of short games rooted in neuroscience to assess cognitive and behavioral traits. No resumes. No names. No demographic information collected.

The bias they found emerged from gameplay patterns, not from anything resembling a traditional resume screening process.

The dataset covered 3.4 million applicants submitting 4 million applications across 156 large employers—the majority with annual revenues of at least $5 billion—across 11 market sectors. The data spans December 2018 through December 2022.

Small businesses, startups, and mid-market companies are not represented in this research. The comparison data the researchers reference (83,000 applications to 108 Fortune 500 firms) comes from a separate prior study used as a baseline, not from the pymetrics dataset itself. These are two distinct datasets serving two distinct purposes.

That's the scope. And scope matters enormously when you're reporting on research like this.

The Findings That Are Real — And Significant

I want to be clear about something before addressing the misreporting: the racial disparities documented in this study are real, and they are legally significant.

25.87% of applications submitted by Black applicants and 14.74% of applications submitted by Asian applicants were directed to positions that adversely impact those groups under Title VII of the U.S. Civil Rights Act. That's not a minor finding. It deserves serious attention from employers, policymakers, and vendors alike.

The study also demonstrates what the researchers call "algorithmic monoculture," or the idea that when multiple employers rely on the same vendor, correlated outcomes become possible. Meaning the same candidates can get systematically shut out across multiple employers, not because of one rejection, but because similar algorithms trained on similarly homogeneous workforces produce similar decisions. Of applicants submitting 4 applications in the dataset, 10% were systemically rejected—a rate that significantly exceeds what you'd expect from independent decisions.

These are serious structural concerns. I am not discounting them.

What I am pushing back on is what's being claimed about them.

Where the Social Media Narrative Breaks Down

Most job seekers interact with ATS platforms — not pymetrics.

Workday, Greenhouse, BambooHR, Lever — these are the platforms most candidates encounter on every application. Very few have played a pymetrics game as part of a hiring process. So the natural assumption when people see this research is that it's about the tools they recognize. It isn't. The conflation of pymetrics with ATS platforms is at the root of most of the misreporting.

Scores don't transfer across companies and platforms.

The viral claim is that your application generates a score that follows you everywhere. That's not what the study found. Each employer's pymetrics model is trained against their own current employees in a given role (42 distinct models were used across the 156 employers in the study). There is no shared applicant score database propagating rejections across unrelated companies.

There is one nuance from the full paper worth being precise about: within pymetrics specifically, gameplay data is stored and reused across applications for up to 330 days. So if you applied to multiple pymetrics-screened positions within that window, your gameplay results would carry over within that platform. That's real. But it is not the same as an ATS score following you across unrelated systems and companies—which is the claim being made. That's a significant overreach.

HireVue is mentioned — but not studied.

The study references HireVue to make a separate point: that over 60% of Fortune 100 companies use HireVue's algorithms, illustrating that vendor concentration in hiring is a broader market reality. The study never examined HireVue's algorithms, never analyzed HireVue's data, and makes no findings about HireVue's outcomes. And for the record, pymetrics and HireVue are not the same company.

The monoculture argument is structural, not universal.

The researchers demonstrate the monoculture effect empirically for pymetrics. They use HireVue's market share to argue that the conditions for monoculture exist more broadly. But this study does not test that claim across the broader market. Whether other vendors produce similar patterns is an open, unstudied question.

The researchers themselves said there is no causal evidence.

This is perhaps the most important thing to understand about this research. The study demonstrates correlation and statistical anomaly. It does not prove that the algorithm caused the disparities. Researcher Sarah Bana stated this explicitly: "There is no causal evidence in our paper." That doesn't diminish the findings, but it matters enormously for how strongly any claim derived from them can be stated.

The Gap Between What the Study Models and What Candidates Are Living

The study suggests that applicants need to submit 25 applications to pymetrics-screened positions to ensure at least one recommendation with 99.9% probability. That figure is specific to pymetrics-mediated positions and isn't a general job market recommendation.

But here's what the research doesn't account for: candidates today aren't submitting 25 applications. They're submitting hundreds—often to roles they're underqualified or overqualified for—because the market has pushed them into a volume game as a survival strategy. The study's model assumes reasonably targeted application behavior. The reality is that candidates have already moved well beyond that, not by choice, but because the system has made it necessary.

That gap between what the research models and what candidates are actually living is exactly why translating academic findings directly into job seeker advice is more complicated than it looks.

The Bigger Problem

Here's what actually concerns me most.

The researchers did important, rigorous work. The access they had to pymetrics data was unprecedented. To their knowledge, no other independent research team has studied deployed hiring algorithms at this scale. That access produced findings that matter.

But in the current environment, nuanced research travels fast and arrives oversimplified. What we're seeing is people inferring conclusions the study doesn't make, reporting those inferences as fact, and millions of people sharing that, compounding the problem with every post.

And I understand why it spreads. We're in a volatile environment. Layoffs, mass unemployment, ghosting, rejections at scale. When you're frustrated and someone hands you an explanation for why it's happening, it's easy to share it before you've verified it. That's how social media works.

But in a job market this fragile, misinformation is an accelerant. It doesn't just confuse people. It pulls us further away from actually solving the problem. That responsibility doesn't sit only with content creators and thought leaders.

Researchers and institutions publishing in this environment have to think about how their work travels, how findings are framed, what inferences they make easy, and what happens when those inferences reach an audience of millions of frustrated job seekers. Good intentions don't insulate a finding from misuse at scale.

Go Read It Yourself

I've linked all three sources below. I'm not asking you to take my word for what I've written here because I am not the final authority on this topic. Go verify it. Read the full paper. Read the summary. Read the Q&A with the researchers.

And before you share anything about this study, including this post, make sure you've done that work first.

Clarence Maur Bongalos is a coaching and consulting practitioner and co-founder of Emppowered, a hiring and career alignment platform. His work focuses on diagnosing and solving misalignment across leadership, marketing, and strategy, and the systems that connect people to opportunity.

CLARENCE MAUR
Coaching & Consulting