College Students’ Test Scores Soared After ChatGPT. Their Writing? Not So Much

(Photo by KinoMasterskaya on Shutterstock)

Study shows that AI tools helped students through online tests but left them floundering on research tasks.

In A Nutshell

Exam scores jumped nearly 22 points after ChatGPT’s launch, while writing project marks dropped by about 10.
Passing students generally improved, but failing students showed mixed results—better exams but lower overall marks.
Creative research proposals showed no change, highlighting tasks where AI offers little advantage.
Universities face a dilemma: AI boosts the easiest-to-grade assessments, while deeper tasks require costly human review.

QUEENSLAND, Australia — In just two years, students who had struggled to pass exams began turning in far stronger test performances. A study of more than 3,300 college students shows that ChatGPT’s arrival coincided with one of the sharpest swings in grades ever recorded in a single course, with exam pass rates climbing from about 50% to 86% and average scores rising by nearly 22 percentage points.

But the shift came with a troubling tradeoff. The same students who were suddenly excelling on multiple-choice exams were performing worse on their writing assignments. The artificial intelligence boom hasn’t only reshaped how students study, but it’s altered the rules of academic success, giving an edge to students who can use AI effectively while leaving others behind.

Peter Dunn, a researcher at Australia’s University of the Sunshine Coast, uncovered this pattern while teaching his long-running first-year statistics course. Each semester he taught around 500 students using the same textbook, assignments, and teaching approach. When ChatGPT launched in late 2022, the numbers quickly changed.

“Because little else has changed substantially in the course from 2022 and 2024, an opportunity is presented to evaluate the impact of GenAI by examining the change in assessment marks over this time,” Dunn wrote.

What he found challenged expectations. Rather than a simple cheating crisis, the data revealed an uneven academic ecosystem where results hinged not only on study habits but also on knowing how to make AI work for you.

ChatGPT prompt on smartphone — ChatGPT’s rise coincided with an incredible rise in test scores for students in Dunn’s course. (© daily_creativity – stock.adobe.com)

Exam Performance Jumped, Writing Marks Fell

Before ChatGPT, Dunn’s students typically struggled with the end-of-semester exam, a 40-question online multiple-choice test covering a full semester of statistics. In 2022, the average student answered just under 20 questions correctly, earning a mean mark of 49.5%, which was a failing grade.

By 2024, average performance had jumped to nearly 29 correct answers, equivalent to a solid B. The exam pass rate shot up from just under half of students to over 85%. When Dunn compared exam trends with ChatGPT’s user growth, the curves looked strikingly similar.

Students who had once found exams nearly impossible were now reaching grades that would have been considered exceptional just a few years earlier. In most educational contexts, such a leap would suggest a new teaching method or a particularly capable student group. But Dunn’s data pointed to something else: many students were likely leaning on AI to solve exam questions in real time.

While exam marks surged, writing assignments moved in the opposite direction. Scores on research projects dropped by more than 10 percentage points, with pass rates falling from the high 80s into the low 70s.

This created a paradox rarely seen in education. Historically, strong test-takers also wrote strong research papers. Struggling students tended to struggle across the board. By 2024, that link had broken. Some students aced every exam yet failed their research papers, and others showed the reverse pattern.

Dunn speculated that many students were pasting AI output directly into their projects without checking if the content matched course requirements. The result: polished but inaccurate papers that underperformed when graded.

Strong Students Benefited, Struggling Students Showed Mixed Results

Another layer to the findings was how AI affected different groups of students. For those who ultimately passed the course, AI appeared to provide a boost, raising overall marks. But among students already at risk of failing, the picture was mixed.

On the one hand, their overall course marks declined from an average of 31% in 2022 to 25% in 2024. That drop meant more struggling students fell further behind. On the other hand, their exam performance actually improved, with average scores rising from about 19% to 42%.

“The mean overall mark for passing students has increased, while decreased for failing students,” Dunn reported. In other words, students with a baseline understanding of statistics could use AI to enhance their work, while those without that foundation often misapplied it.

Instead of leveling the field, AI use seemed to widen the gap. Success depended not just on studying but on technological skill, such as knowing when AI output was reliable and when it was not.

College student taking exam and writing in notebook — Interestingly, students’ scores dropped on writing assignments. (Photo by kzenon on Shutterstock)

The Assignments ChatGPT Couldn’t Help

One type of work proved almost immune to AI’s influence: creative research proposals. These assignments required students to design a study plan under strict guidelines, using course-specific methods unknown to AI tools. They had to show feasibility, ethics, and statistical planning unique to the course.

Here, grades stayed consistent across 2022–2024. That stability highlights a key insight: when tasks demand human creativity and inside knowledge, current AI systems provide little shortcut.

Universities Confront an Assessment Dilemma

The study, published in the International Journal of Mathematical Education in Science and Technology, raises uncomfortable questions for higher education. Exams and online quizzes, which are fast and inexpensive to grade, showed the greatest AI-driven gains. The tasks that required more human judgment, like research projects and proposals, either declined or resisted AI entirely.

“The assessment task where GenAI may have had the least impact on student marks (Task 2A; the Project Proposal) is also the assessment task that is more difficult and time-consuming to mark,” Dunn observed. “In contrast, the assessment task where GenAI may have been most beneficial to students (Exam) is the one that is easiest to mark.”

This puts universities in a bind. Institutions under financial strain often prefer scalable, computer-graded tests. But these may now be the least reliable measure of student ability. Meanwhile, assignments that reveal true understanding demand more staff time and resources.

Dunn examined six semesters of his Scientific Research Methods course, with 1,202 students before ChatGPT’s release and 2,107 after. Students represented over 30 programs in science, health, and engineering, most in their first year. Assessment weightings remained constant: online quizzes (25%), research projects (35%), and exams (40%).

Although students were instructed not to use AI on exams, and to disclose limited AI use on projects, there was no way to monitor compliance. This gap left plenty of room for misuse, intentional or not.

Redefining Academic Success in the Age of AI

The findings spark a deeper question: what does it now mean to succeed in college? Are higher exam scores evidence of smarter study strategies, or simply evidence of better AI use?

The study cannot answer whether AI improved actual learning. Exam completion times stayed similar across years, suggesting students were still spending the same effort, but perhaps differently. The bigger challenge may be that universities risk producing graduates who can use chatbots fluently but who struggle with independent thinking.

As AI grows more advanced, Dunn cautions that the patterns seen so far may only be the beginning. Unless universities adapt, they may end up awarding degrees that reflect AI proficiency more than student understanding, undermining the very mission of higher education.

Paper Summary

Methodology

Researcher Peter Dunn analyzed student performance data from a large first-year statistics course at Australia’s University of the Sunshine Coast across six semesters from 2022-2024. The study included 3,109 students total, with 1,202 students in pre-AI semesters (2022) and 1,907 in post-AI semesters (2023-2024). The course remained virtually identical across all semesters, using the same assessments, textbook, teaching methods, and staff. Students completed online quizzes (25% of grade), research projects (35%), and online examinations (40%). All assessments except creative project proposals were conducted online without supervision. The researcher used statistical methods including general linear models, correlation analysis, and chi-squared tests to compare performance before and after ChatGPT’s November 2022 launch.

Results

Examination marks increased dramatically by 21.88 percentage points from pre-AI to post-AI periods, with pass rates rising from around 50% to 86%. Conversely, research project marks declined by 10.44 percentage points with pass rates falling from the high 80s to 72%. Online quiz performance showed minimal changes overall. Correlations between different assessment types, particularly exams and projects, disappeared almost entirely by 2024. Passing students benefited from AI with improved overall grades, while failing students saw worse performance. Creative project proposals requiring human creativity and course-specific knowledge showed no change. Grade distributions shifted, with fewer students receiving basic passing grades and more receiving high distinctions.

Limitations

The study cannot definitively prove that observed changes resulted solely from AI use, though the timing and nature of changes strongly suggest AI influence. The research cannot determine whether grade improvements reflect genuine learning enhancement or merely better test performance through AI assistance. Completion times for assessments remained similar across semesters, suggesting AI use might not dramatically change time spent on tasks. The study examined only one course at one institution, limiting generalizability. Multiple confounding variables could potentially explain some findings, though the researcher argues the substantial size and timing of changes make AI the most likely explanation.

Funding and Disclosures

The researcher reported no potential conflicts of interest. The study received ethical approval from the University of the Sunshine Coast Human Research Ethics Committee (A242247), with consent waived by the ethics committee. No funding sources were mentioned for this research.

Publication Information

This study was published by Peter K. Dunn from the School of Science, Technology and Engineering at University of the Sunshine Coast, Australia. The paper “Generative AI may impact students’ marks: a case study from a large first-year statistics course” was published in the International Journal of Mathematical Education in Science and Technology on September 18, 2025, with DOI: 10.1080/0020739X.2025.2539711. The journal is published by Taylor & Francis Group under an Open Access license.