Generative AI: Robot hands typing

Generative artificial intelligence is becoming increasingly difficult to detect. (Image by feeling lucky on Shutterstock)

In a Nutshell

  • Two economists used an AI language model to produce 380 complete, journal-formatted academic finance papers in roughly 12 hours, each built around reverse-engineered theories designed to explain data the AI had already seen.
  • AI-generated signals performed statistically comparably to signals published in top peer-reviewed finance journals, with equal-weighted results overlapping almost perfectly with published research.
  • AI-written introductions clustered tightly at a college-graduate reading level and produced prose that matched the formatting conventions of leading finance journals, though with less stylistic variation than human authors.
  • Researchers warn that scaled AI paper generation could overwhelm journal review systems, artificially inflate academic citation counts, and erode the metrics used to evaluate researchers for tenure and funding.

A pair of economists just proved that artificial intelligence can churn out hundreds of journal-style academic papers in a matter of hours, complete with data, citations, economic theory, and even author names. The papers look real. The statistical testing behind them is real. But the “discoveries” they claim to make? Reverse-engineered after the fact by a machine.

Two finance professors at leading American universities set out to show just how easy it had become to industrialize one of academia’s most persistent bad habits: building a theory to explain data you’ve already seen, then pretending you came up with the theory first. In academic circles, this practice has a name, “HARKing,” which stands for Hypothesizing After Results are Known. What the researchers found was that AI doesn’t just enable HARKing on a new scale. It automates it entirely, at a speed that could overwhelm the academic publishing system before anyone figures out what to do about it.

Robert Novy-Marx of the Simon Business School at the University of Rochester and Mihail Velikov of Penn State’s Smeal College of Business published their findings in the Journal of Economic Literature in March 2026. Their paper is equal parts technical tour de force and cautionary alarm, a demonstration of what AI can do to academic science that is as sobering as it is impressive.

AI robot typing on a laptop computer in an office
AI-written papers could pass quality checks for peer-reviewed journals, scientists warn.(© Mardiyo – stock.adobe.com)

How Researchers Used AI to Mass-Produce Finance Papers

To build their assembly line, Novy-Marx and Velikov started with raw financial data. They pulled accounting information on publicly traded U.S. companies from two major databases covering decades of records: COMPUSTAT, which tracks corporate financial statements going back to 1950, and CRSP, a stock market database with data going back to 1926. From those sources, they mathematically constructed more than 31,000 potential “signals,” patterns in accounting numbers that might predict how a stock will perform.

Most of those signals didn’t hold up under scrutiny. After running them through a series of increasingly strict statistical tests, the researchers filtered the original pool down to just 95 signals that survived all quality checks. Each had to show consistent, statistically meaningful results across multiple ways of slicing the data, including adjustments for firm size and known market risk factors. Only about four-tenths of one percent of the original candidates made the cut.

With those 95 validated signals in hand, the team handed the work over to an AI language model. Specifically, they used Claude Opus 4.1, Anthropic’s most advanced reasoning model at the time of the experiment. For each signal, the AI generated four complete academic papers, each one built around a different economic theory to “explain” the same finding.

One version argued that investors are slow to absorb complex financial information. Another leaned on theories about production costs and investment risk. A third drew from consumption-based economic models. A fourth was written without a specific theoretical angle. In total, the pipeline produced 380 finished papers, each roughly 30 pages long, with abstracts, introductions, data sections, results tables, charts, and references, all formatted to match top finance journal standards.

The data mining and validation steps took about a day of computing time. The AI-generated papers took about 12 hours.

AI-Generated Finance Research Papers Fooled Standard Quality Checks

The papers that came out of this pipeline were, by multiple measures, eerily convincing. Each AI-generated introduction followed standard academic conventions, framing a research question, citing related literature, building a logical theoretical argument, and summarizing the key results. The citations were drawn from real published work, though the authors note the AI occasionally “hallucinated” references that don’t actually exist. Signal names were generated to sound authoritative and specific: a ratio of other current assets to shareholders’ equity became “Liquidity Leverage Intensity.” A measure of acquisitions relative to working capital was labeled “Acquisition Capacity Utilization.”

When the researchers compared the statistical strength of their AI-generated signals against 212 signals published in actual peer-reviewed finance journals, the data-mined signals were nearly indistinguishable. For equally-weighted portfolio strategies, the distribution of statistical results from the AI-generated signals overlapped almost perfectly with the distribution from published academic papers.

That finding alone carries a pointed message: the bar that peer review sets for finance research may be no higher than what an automated data-mining exercise can clear on its own.

Readability tests told a similar story, though with a revealing twist. Novy-Marx and Velikov compared the AI-written introductions against 140 published papers using standard measures of text complexity. AI-generated introductions clustered tightly at the higher end of the scale, around 16 to 18 years of education required to comprehend them, roughly college-graduate level, with very little variation across all four theoretical versions.

Human-authored papers spread more widely, with median scores somewhat lower at 13 to 16 years of education, and notable outliers on both ends. The machine’s prose was consistent and polished, but it lacked the stylistic range of human academic writing.

What This Means for the Future of Academic Research

None of the 380 papers were submitted to journals, and the researchers are clear that the experiment was designed to sound an alarm, not to flood the academic literature with junk. But the alarm is a loud one. The authors note that submitting all 380 papers to peer-reviewed journals would impose hundreds of thousands of dollars in reviewing costs on the profession, and if even a small fraction of researchers adopted this approach, the journal system could be overwhelmed.

Citation inflation is another concern the paper raises directly. Each AI-generated paper cites prior research to build its theoretical case, including, in many cases, the authors’ own earlier work. Scaled across hundreds or thousands of papers, automated citation generation could artificially inflate citation counts, a metric that tenure committees, grant agencies, and hiring panels use to evaluate academics. Novy-Marx and Velikov even calculate that if search engines index the 95 papers they’ve publicly posted, each of them could pick up hundreds of additional citations without a single human reader choosing to cite their work.

The paper stops well short of calling AI in research inherently destructive. AI can, the authors argue, democratize research by lowering the barriers to hypothesis generation, accelerate the pace of discovery, and help researchers map connections across large bodies of literature far faster than was previously possible.

There’s even a genuine scientific case for post-observation theorizing: Isaac Newton, after all, watched an apple fall before he developed his theory of gravity. The problem isn’t looking at data before forming a theory. The problem is doing so secretly, at industrial scale, and presenting the result as original insight.

Novy-Marx and Velikov call for researchers to be held fully accountable for any work they produce with AI assistance, not merely required to disclose that AI was used, a standard they argue is too weak to matter. They also advocate for new validation systems capable of detecting circular reasoning, redundant theorizing, and hallucinated citations. And they argue that economic theories offered to explain new findings should be judged, at least in part, by whether they make novel predictions that go beyond the result they were built to explain.

“AI can now produce a ton of papers at scale, and it’s going to change the nature of how we produce and disseminate knowledge. This is an early warning signal of what’s coming with modern AI capabilities,” Velikov said in a statement.

“I’m far from the opinion that we’ll all be out of jobs and replaced by AI,” he added, “but I think our jobs will evolve a lot, and the more we invest in understanding how these systems work, the better research we’ll be able to do.”

Whether academic institutions move fast enough to build those safeguards is an open question. For now, the 380 papers sit in a public GitHub repository, proof that the assembly line works and that current safeguards may not be ready for it.


Paper Notes

Limitations

Novy-Marx and Velikov acknowledge that their experiment, while technically rigorous in its data mining and signal validation steps, cannot definitively determine whether the 95 signals they identified reflect genuine economic phenomena or sophisticated data mining that happens to survive statistical scrutiny. The authors note that some of the signals may represent real market inefficiencies overlooked by prior researchers, while others may be artifacts of the search process. The paper does not include a field experiment (submitting the AI-generated papers to actual peer review alongside human-authored research), which the authors identify as the only way to conclusively assess whether referees can distinguish between them. The readability comparisons were made against 140 papers from a single open-source database and may not fully represent the range of writing quality across all finance journals. The experiment was conducted using a specific AI model (Claude Opus 4.1) at a specific point in time; capabilities of AI systems are evolving rapidly, and results could differ with newer models.

Funding and Disclosures

Financial support for the research was provided by INQUIRE Europe. Robert Novy-Marx provides consulting services to Dimensional Fund Advisors, an investment firm headquartered in Austin, Texas, with strong ties to the academic community. The authors state that the thoughts and opinions expressed in the paper are their own, and that no other person or institution has any control over the paper’s content. All code used to construct the AI-generated paper pipeline is publicly available at https://github.com/velikov-mihail/AI-Powered-Scholarship.

Publication Details

Paper title: “Artificial Intelligence–Powered (Finance) Scholarship”

Authors: Robert Novy-Marx (Simon Business School, University of Rochester, and NBER) and Mihail Velikov (Smeal College of Business, Pennsylvania State University)

Journal: Journal of Economic Literature, Vol. LXIV (March 2026), pp. 5–37

DOI: 10.1257/jel.20251821

About StudyFinds Analysis

Called "brilliant," "fantastic," and "spot on" by scientists and researchers, our acclaimed StudyFinds Analysis articles are created using an exclusive AI-based model with complete human oversight by the StudyFinds Editorial Team. For these articles, we use an unparalleled LLM process across multiple systems to analyze entire journal papers, extract data, and create accurate, accessible content. Our writing and editing team proofreads and polishes each and every article before publishing. With recent studies showing that artificial intelligence can interpret scientific research as well as (or even better) than field experts and specialists, StudyFinds was among the earliest to adopt and test this technology before approving its widespread use on our site. We stand by our practice and continuously update our processes to ensure the very highest level of accuracy. Read our AI Policy (link below) for more information.

Our Editorial Process

StudyFinds publishes digestible, agenda-free, transparent research summaries that are intended to inform the reader as well as stir civil, educated debate. We do not agree nor disagree with any of the studies we post, rather, we encourage our readers to debate the veracity of the findings themselves. All articles published on StudyFinds are vetted by our editors prior to publication and include links back to the source or corresponding journal article, if possible.

Our Editorial Team

Steve Fink

Editor-in-Chief

John Anderer

Associate Editor

Leave a Comment