AdobeStock_151057254

(© oatawa - stock.adobe.com)

Chatting With a Stranger Online? There’s a Good Chance It Isn’t a Person.

In A Nutshell

  • When given a specific personality to imitate, an AI called GPT-4.5 was mistaken for a human 73% of the time in a rigorous, controlled Turing test, more often than the actual human it was competing against.
  • The trick wasn’t intelligence. People tried to spot AI through small talk and emotional cues, not logic puzzles, and that’s exactly where today’s AI excels.
  • Extending conversations to 15 minutes didn’t help: AI still passed, and no demographic factor, including age, education, or chatbot experience, made people better at telling the difference.
  • Researchers warn that AI systems capable of impersonating real people could quietly erode trust in online interactions, in the same way counterfeit currency undermines real money.

Scientists ran the original, stricter version of a famous test of machine intelligence, and the results should make anyone pause before assuming they know who’s on the other end of a chat window.

In a study published in Proceedings of the National Academy of Sciences, researchers put several AI systems head-to-head against real people in a classic challenge called the Turing test, a 75-year-old idea that asks whether a machine can hold a conversation convincingly enough that a human judge can’t tell it apart from another person. Under the right conditions, the answer is yes. Not just barely. In some cases, people were more likely to mistake the AI for a human than to correctly identify the actual person on the other side of the conversation.

That’s not a glitch in the experiment. That’s the point. When a specific AI system called GPT-4.5 was given detailed instructions on how to come across as a real person (a setup the researchers called GPT-4.5-PERSONA), human judges picked it as the human 73% of the time and were actually less likely to correctly identify the real person in the room. For anyone who has assumed they could just “feel” when they’re talking to a bot, this study offers a brisk correction.

How the Turing Test Actually Works

Most people have a vague idea of what the Turing test is: a machine tries to convince a human it’s real. But the version used in this study is more demanding than pop culture suggests. This is the three-party format, in which the human judge chatted with two strangers at the same time, one real person and one AI, and had to figure out which was which.

This setup is harder for AI to pass because the judge isn’t just asking “could this be a person?” They’re directly comparing two conversations. In older versions of the test with earlier AI systems, machines were correctly identified as non-human more than 90% of the time. Today’s AI can beat those odds, and in some cases beat the humans they’re competing against.

ai human infographic
An AI was mistaken for a human 73% of the time in a new study. Researchers say most people can’t tell the difference anymore. (Image by StudyFinds)

Why AI Fooled So Many People

Two groups participated in the initial experiments: undergraduate students at UC San Diego and adults recruited through an online research platform. After exclusions, researchers analyzed over 1,000 games, each consisting of five-minute text conversations.

Researchers tested the AI systems with two different sets of instructions. One gave only bare-bones guidance. The other, the PERSONA prompt, told the AI to act like a young, somewhat introverted person familiar with internet culture and slang. That difference mattered enormously. Without persona instructions, GPT-4.5’s success rate dropped to 36%, meaning judges were more likely than not to correctly identify it as the machine. A specific personality to imitate flipped the results entirely.

One of the more revealing findings is how people tried to spot the AI. Most assumed they’d catch it by asking something clever. Instead, the most common approach was small talk. In 61% of games, judges asked witnesses about their daily lives and personal details. About half the time, they probed for emotional warmth, asking about feelings, humor, and opinions. Only 12% asked logic puzzles or knowledge questions traditionally associated with testing a machine.

That number is telling. After 75 years of computers beating humans at chess and calculating in milliseconds, people no longer seem to think raw intelligence is what separates humans from machines. They’re looking for something warmer: personality, spontaneity, the feeling that someone genuinely gets the joke. Reasons judges gave for their verdicts focused heavily on writing style, conversational flow, and gut feeling. One participant admitted to picking a witness “because they had human vibes.” When judges were accurate, it was often because a witness dodged a direct question, or because they didn’t know something, which felt more human than an AI’s typically thorough answers.

Does a Longer Conversation Help? Not Really.

To address the concern that five minutes wasn’t enough time, the researchers ran a follow-up extending conversations to 15 minutes and preventing judges from ending early. In that study only, GPT-5 replaced GPT-4.5 because GPT-4.5 had been deprecated by OpenAI. Both GPT-5 and LLaMa-3.1-405B still passed. Judges remained unable to reliably tell them apart from humans, and none of the demographic factors researchers tracked, including age, education, chatbot use, and self-reported AI knowledge, predicted who would be more accurate.

What This Means for Anyone Who Chats Online

Passing the Turing test doesn’t settle every philosophical debate about whether machines are truly “thinking.” But what the study shows, in concrete numbers across multiple experiments, is that AI systems can already stand in for a real person in a short text conversation, and most people won’t notice.

Researchers use the phrase “counterfeit people” to describe systems capable of this kind of imitation, noting that just as fake currency erodes the value of real money, simulated human interactions could gradually erode the value of real ones.

Ultimately, the most unsettling detail isn’t that AI passed a famous test. It’s that the real human in the room, an actual person trying to seem like themselves, lost.


Disclaimer: This article is based on a peer-reviewed research study, but it is intended for general informational purposes only. The findings reflect results from controlled experimental conditions and may not apply to all real-world scenarios or AI systems.


Paper Notes

Limitations

The researchers acknowledge several constraints on their findings. The test was conducted over relatively short time windows, and while a 15-minute replication produced consistent results, the authors note that still-longer conversations, lasting an hour or more, could be more revealing. The study also relied on participants recruited from a psychology undergraduate subject pool and an online research platform; deliberately recruiting AI or psychology experts, or offering stronger financial incentives, might yield different results. The role of the specific persona prompt is also flagged as an open question: results depended heavily on whether AI systems were given detailed instructions, and the degree to which different prompts affect outcomes warrants further investigation. The researchers note that the Turing test, as implemented here, reflects a specific and widely accepted operationalization of Turing’s original proposal, and that alternative formats could produce different findings.

Funding and Disclosures

The authors declare no competing interests. According to the paper’s acknowledgments, Coefficient Giving provided funding support, and 12 donors supported an exploratory phase of the project through Manifund.

Publication Details

Authors: Cameron R. Jones (Department of Psychology, Stony Brook University; Department of Cognitive Science, University of California San Diego) and Benjamin K. Bergen (Department of Cognitive Science, University of California San Diego) | Paper Title: “Large language models pass a standard three-party Turing test” | Journal: Proceedings of the National Academy of Sciences (PNAS) | Published: May 19, 2026 | Volume/Issue: Vol. 123, No. 21, e2524472123 | DOI: https://doi.org/10.1073/pnas.2524472123

About StudyFinds Analysis

Called "brilliant," "fantastic," and "spot on" by scientists and researchers, our acclaimed StudyFinds Analysis articles are created using an exclusive AI-based model with complete human oversight by the StudyFinds Editorial Team. For these articles, we use an unparalleled LLM process across multiple systems to analyze entire journal papers, extract data, and create accurate, accessible content. Our writing and editing team proofreads and polishes each and every article before publishing. With recent studies showing that artificial intelligence can interpret scientific research as well as (or even better) than field experts and specialists, StudyFinds was among the earliest to adopt and test this technology before approving its widespread use on our site. We stand by our practice and continuously update our processes to ensure the very highest level of accuracy. Read our AI Policy (link below) for more information.

Our Editorial Process

StudyFinds publishes digestible, agenda-free, transparent research summaries that are intended to inform the reader as well as stir civil, educated debate. We do not agree nor disagree with any of the studies we post, rather, we encourage our readers to debate the veracity of the findings themselves. All articles published on StudyFinds are vetted by our editors prior to publication and include links back to the source or corresponding journal article, if possible.

Our Editorial Team

Steve Fink

Editor-in-Chief

John Anderer

Associate Editor

Leave a Comment