X app on smartphone

(© pitipat - stock.adobe.com)

In a nutshell

  • AI outperformed humans at interpreting short social media bios, providing more consistent and meaningful cluster labels than human reviewers, especially for ambiguous or hard-to-define groups.
  • ChatGPT was better at detecting when data lacked real patterns, avoiding the human tendency to assign meaning to randomness, a bias known as apophenia.
  • The study suggests that large language models could be powerful tools for analyzing human expression, but their black-box nature raises questions about transparency, consistency, and accountability.

SYDNEY — On X (or Twitter for those who still call it that), millions of people sum up their entire identity in a few words in their bios. Companies spend fortunes trying to decode these digital breadcrumbs, but they’ve been using the wrong decoder. A new study from the University of Sydney reveals that AI systems now outperform humans at interpreting these brief self-descriptions, finding patterns we’re literally blind to.

Published in the Royal Society Open Science journal, the research shows that large language models can cluster and interpret short text with remarkable accuracy. When researchers asked ChatGPT to analyze the same social media profiles that stumped human volunteers, the AI provided more consistent and comprehensive interpretations.

The study found that artificial intelligence showed good agreement with human reviewers while helping bridge the gap that often exists between creating clusters and interpreting what they mean. However, there were biases in both human and AI approaches.

Companies, researchers, and governments spend enormous resources trying to decode these patterns to understand everything from consumer preferences to political movements.

Traditional approaches to analyzing short text have hit a wall. Unlike longer documents that provide context clues, tweets and social media bios offer minimal information. Previous computer methods often grouped posts in ways that made little sense to human reviewers, while human analysis proved inconsistent and biased.

Analyzing Digital Personalities

Woman using X app on smartphone
Posts on X and user bios provide little context that can be hard to analyze. (© bongkarn – stock.adobe.com)

Researchers tackled this challenge using 38,639 X user biographies, those brief self-descriptions people write to introduce themselves on the platform. All users had mentioned Donald Trump in their posts during September 2020, providing a focused dataset around political engagement.

The team compared three different approaches: a traditional topic modeling method called Latent Dirichlet Allocation (LDA), an older AI technique called doc2vec, and a modern large language model called MiniLM. Each method attempted to group similar profiles together into 10 distinct clusters.

Human reviewers then evaluated each cluster, rating how confident they felt about naming the group and how well the profiles seemed to fit together. They also attempted to create descriptive names for each cluster.

MiniLM consistently outperformed the other methods, creating clusters that human reviewers found more interpretable and distinctive. When reviewers looked at profiles grouped by the AI, they could more easily identify common themes like “artistic bios,” “political conservatives,” or “family-focused users.”

But when researchers introduced ChatGPT as an additional reviewer, the system analyzed the same clusters and provided names and interpretations, often succeeding where human reviewers failed.

When Machines See What Humans Miss

Human reviewers frequently wrote “none” when asked to name certain clusters, indicating they couldn’t identify a unifying theme. ChatGPT, however, provided names for all clusters, including those that stumped people entirely.

ChatGPT prompt on computer
ChatGPT was able to sort bios by patterns that humans sometimes missed. (Bangla press/Shutterstock)

For example, one cluster that humans struggled to categorize was labeled “general” by the researchers. Human reviewers couldn’t find a consistent pattern, but ChatGPT identified it as representing a common type of human self-expression—people who describe themselves in broad, non-specific terms.

Similarly, ChatGPT successfully identified a “quotes” cluster based on users who included inspirational sayings in their bios, a pattern that many human reviewers missed or dismissed.

ChatGPT demonstrated greater ability to distinguish between different clustering approaches and was significantly better at recognizing when clusters were randomly generated compared to human reviewers. This suggests the AI is less susceptible to human biases, particularly the tendency to perceive meaningful patterns even in random data.

The AI’s advantage seemed to stem from its training on vast amounts of text data, allowing it to recognize subtle linguistic patterns that escape human notice. While humans tend to focus on obvious political or demographic markers, the AI detected more nuanced categories based on writing style, emoji usage, and implicit cultural references.

The Bias Problem

The study revealed significant unsettling biases in how people interpret information. Human reviewers demonstrated what researchers called the well-known human tendency to see patterns in noise, finding meaning even in randomly generated clusters of profiles.

When presented with profiles that had been randomly assigned to groups with no underlying logic, human reviewers still attempted to create coherent narratives and names. ChatGPT, by contrast, was better at recognizing when clusters lacked genuine patterns.

Human reviewers also displayed a tendency to focus primarily on single dimensions, concentrating on political orientation while missing other meaningful patterns. This mirrors broader research on human categorization, which shows people often oversimplify complex data by focusing on familiar concepts.

The AI system demonstrated more sophisticated pattern recognition, identifying clusters based on occupation, family status, communication style, and even characteristics like emoji usage — categories that require synthesizing multiple subtle cues.

Research focused specifically on Twitter users interested in political topics, which may not represent broader groups of social media users. The 39 human reviewers were also non-experts recruited through an online platform, rather than trained analysts.

The researchers also acknowledge a troubling aspect of using AI systems for interpretation: these systems operate largely as black boxes, and in ChatGPT’s case, are maintained by a private company. This opacity makes it difficult to understand exactly how the AI reaches its conclusions or to ensure consistent results over time. The study also revealed that different versions of ChatGPT produced somewhat different interpretations.

While ChatGPT excelled at identifying content-based patterns, it sometimes missed elements that humans caught more easily, such as recognizing that someone was using quotes or emojis as forms of expression rather than focusing solely on the content itself.

Companies analyzing customer feedback, political consultants studying voter sentiment, and social scientists examining cultural trends could all benefit from AI systems that identify patterns humans miss. However, the opaque nature of these systems raises questions about transparency and accountability in decision-making.

Researchers propose a hybrid approach: using AI for initial pattern detection and interpretation, while maintaining human oversight to check for biases and ensure results make intuitive sense. This could combine the AI’s superior pattern recognition with human judgment about meaning and context.

Paper Summary

Methodology

Researchers collected 38,639 Twitter user biographies from people who mentioned Donald Trump between September 3-4, 2020. They tested three different clustering methods: Latent Dirichlet Allocation (LDA), doc2vec with Gaussian mixture modeling, and MiniLM (a large language model) with Gaussian mixture modeling. Each method grouped the profiles into 10 clusters. Thirty-nine human reviewers, recruited through Prolific and all native English speakers from America, evaluated the clusters by rating their confidence in naming each group and assessing how well profiles fit together. Researchers also used ChatGPT to independently analyze and name the same clusters, comparing AI performance to human interpretation.

Results

MiniLM consistently outperformed the other clustering methods, creating groups that human reviewers found more interpretable and distinctive. Human reviewers rated MiniLM clusters higher across all measures of coherence and confidence. ChatGPT provided names for all clusters, including those that human reviewers couldn’t categorize, and showed better ability to distinguish meaningful patterns from random noise. The AI identified categories like “quotes,” “emojis,” and “general” profiles that humans often missed. Both humans and AI showed broad agreement on obvious categories like political orientation, but ChatGPT demonstrated superior consistency in naming and interpretation across multiple runs.

Limitations

The study focused exclusively on Twitter users interested in political topics, which may not represent broader social media populations. Human reviewers were non-experts rather than trained analysts, potentially affecting the quality of human interpretation. Research relied on proprietary AI systems like ChatGPT that operate as “black boxes” and can change over time, making results difficult to replicate. Different versions of ChatGPT produced somewhat varying interpretations, highlighting reliability concerns. The dataset’s political focus may have introduced underlying biases that affected both human and AI analysis.

Funding and Disclosures

Authors declared no competing interests and received no funding for this research. The study received ethics approval from the University of Sydney Ethics Committee (2019/208). Researchers stated they did not use AI-assisted technologies in creating the article itself.

Publication Information

The study “Human-interpretable clustering of short text using large language models” is authored by Miller, J.K. and Alexander, T.J. It was published in the journal Royal Society Open Science (vol. 12) in 2025. The paper was received May 14, 2024, and accepted December 19, 2024.

About StudyFinds Analysis

Called "brilliant," "fantastic," and "spot on" by scientists and researchers, our acclaimed StudyFinds Analysis articles are created using an exclusive AI-based model with complete human oversight by the StudyFinds Editorial Team. For these articles, we use an unparalleled LLM process across multiple systems to analyze entire journal papers, extract data, and create accurate, accessible content. Our writing and editing team proofreads and polishes each and every article before publishing. With recent studies showing that artificial intelligence can interpret scientific research as well as (or even better) than field experts and specialists, StudyFinds was among the earliest to adopt and test this technology before approving its widespread use on our site. We stand by our practice and continuously update our processes to ensure the very highest level of accuracy. Read our AI Policy (link below) for more information.

Our Editorial Process

StudyFinds publishes digestible, agenda-free, transparent research summaries that are intended to inform the reader as well as stir civil, educated debate. We do not agree nor disagree with any of the studies we post, rather, we encourage our readers to debate the veracity of the findings themselves. All articles published on StudyFinds are vetted by our editors prior to publication and include links back to the source or corresponding journal article, if possible.

Our Editorial Team

Steve Fink

Editor-in-Chief

John Anderer

Associate Editor

Leave a Reply