Many Published Studies In Top Science Journals Don't Hold Up To Review, Landmark Audit Finds

(Photo by PolyPloiid on Shutterstock)

Can We Trust Published Research? This Study Has Some Uncomfortable Answers.

In A Nutshell

A large-scale review of 110 economics and political science studies found coding errors in roughly 25% of papers, with some errors serious enough to affect a study’s conclusions.
85% of published findings could be independently reproduced, but about one in four results shifted or no longer held up when researchers tested them under different but reasonable analytical approaches.
More experienced review teams tended to find more problems, suggesting the error and failure rates reported may actually be conservative.
Data-sharing reforms are working at the journals that require them, but most economics and political science journals still have no such requirements.

Science runs on trust. Journals publish findings, policymakers act on them, and the public assumes the math checks out. When hundreds of independent researchers set out to actually verify that math across 110 recently published studies from top journals with mandatory data-sharing rules, they found mostly encouraging results, but also something that should give everyone pause: roughly one in four of those studies contained at least one coding error, and the teams that caught the most problems tended to be the most experienced ones.

If skilled reviewers consistently find more flaws than less experienced ones, the 25% error rate may be an undercount.

Published in Nature, the project was led by Abel Brodeur of the University of Ottawa and the Institute for Replication: take published studies, hand them to independent research teams, and see whether the results hold up.

Economics and Political Science Reproducibility Tested Across 110 Studies

Brodeur’s team recruited hundreds of researchers and organized them into small groups of three to five, each assigned a paper from their own area of expertise. All 110 studies came from 12 leading economics and political science journals that require authors to publicly post their data and underlying computer code, making them among the most transparent outlets in either field. Each team first tried to reproduce the original results using those shared materials, then tested whether the conclusions held up when they made their own reasonable adjustments to the analysis.

On the basic question of whether results could be reproduced at all, the news was good. Independent researchers matched the published findings 85% of the time. Data-sharing at the targeted journals also improved sharply over the past decade, rising from 59% of papers including a replication package in 2014 to around 90% by 2021. Journals that brought on dedicated data editors, staff whose job is to check submitted code before a paper goes to print, saw the fastest gains, often within a single year.

scientist working — A major new audit put 110 published studies to the test. Many didn’t fully hold up, and experts say the problem runs deeper. (Credit: Getty Images For Unsplash+)

Coding Errors Found in a Quarter of Published Papers

Beyond raw reproducibility, a harder problem surfaced. Coding errors turned up in roughly 25% of the studies, with some papers containing more than one mistake. Economics papers had a higher error rate than political science papers, 26% versus 16%, a difference the researchers chalk up to economics code tending to run longer and more elaborate.

Not all of those errors were small. Major problems included entire swaths of duplicated data, treatment variables mislabeled for most or all of a study’s subjects, and models built differently than what the paper described. Some of those errors were serious enough to affect what the study actually concluded. And the true error rate is almost certainly higher than 25%: many replication packages were missing the raw data and data-cleaning steps that would have made additional mistakes visible.

Teams also ran what might be called stress tests on the findings, asking whether a study’s conclusions would survive if they made slightly different but equally reasonable analytical decisions, say, measuring the main outcome a different way, or adjusting which background factors were accounted for. Across more than 2,600 such tests, 72% of results held up. About one in four results no longer met the same statistical threshold or shifted under alternative analyses. Economics again trailed political science, with survival rates of 71% and 78%, respectively.

Expert Reviewers Uncovered More Reproducibility Failures Than Novices

More experienced research teams tended to find fewer results held up, suggesting they may be better at spotting issues. The authors liken it to the “trained eye” of a detective finding subtle clues an untrained observer would miss.

That matters for how to read the numbers. If only the most qualified teams had been assigned to every paper, the 72% survival rate would likely be lower. Standard peer review, which happens before publication and rarely involves reviewing the underlying data and code, appears ill-equipped to catch the kind of errors that specialists find.

Patterns consistent with publication bias showed up as well. Published studies showed a suspicious pile-up of results just barely strong enough to clear the bar for publication, a pattern that largely disappeared when outside researchers reran the analyses. It is roughly the equivalent of a grading curve where too many students scored exactly 70, just enough to pass, suggesting some results may have been nudged, consciously or not, to cross the finish line.

Despite the problems, the project produced real reasons for optimism. About 95% of original study authors responded when outside teams reached out, and roughly two-thirds of those exchanges improved the final review reports. Only 23% of papers ended with unresolved disagreements. Over 40% of the participating researchers said the process left them more confident in their field, compared to roughly 5% who came away more skeptical.

Mandatory data-sharing, dedicated data editors, and organized replication projects are catching problems that standard review misses. The limitation is reach: most journals in both economics and political science still do not require authors to share their data and code at all. The reforms that work are concentrated in a small corner of the literature.

Science can catch its own mistakes. Right now, most of it is not set up to try.

Disclaimer: This article is based on a peer-reviewed study. The findings reflect a selective sample of studies from journals with mandatory data-sharing policies and should not be interpreted as representative of all published research in economics or political science. Some coding errors identified did not affect the conclusions of the original studies. The expertise effect described is correlational, not a direct experimental finding. The publication bias patterns reported are consistent with, but do not definitively prove, intentional manipulation of results.

Paper Notes

Limitations

The 110 papers examined were drawn exclusively from journals with mandatory data-sharing policies, making them more transparent than typical publications in economics and political science. Most of those journals also employ data editors who verify code before publication, which likely elevated the baseline reproducibility rate. The findings should be understood as an optimistic upper bound on reproducibility in the broader literature. Because more experienced reviewing teams found more problems, the measured robustness and error rates may also be understated. The study also over-represents research using publicly available data, further limiting generalizability to the field as a whole.

Funding and Disclosures

Support came from Coefficient Giving and the Social Sciences and Humanities Research Council. Several co-authors are affiliated with government agencies and international institutions including the Bank of Canada, the World Bank, and agencies of the German federal government. Authors note that views expressed are their own and do not represent those affiliated institutions. All remaining errors are the authors’ responsibility.

Publication Details

Authors: Abel Brodeur et al. (lead author, University of Ottawa and Institute for Replication); full author list encompasses hundreds of contributors across dozens of institutions worldwide. | Title: “Reproducibility and Robustness of Economics and Political Science Research” | Journal: Nature | Volume/Issue: 652(8108), pp. 151-156 | DOI: https://doi.org/10.1038/s41586-026-10251-x | Data availability: Zenodo (DOI: 10.5281/zenodo.17792605) and OSF (DOI: 10.17605/OSF.IO/8WSQX)