'Safe AI' Isn't Enough: A Philosopher’s Warning About Fully Autonomous Moral Machines

Ai,Ethics,And,Ai,Law,Concept.,Ai,Text,On,The

Credit: chayanuphol on Shutterstock

Why The AI Systems Of Tomorrow Need More Than Rules Against Harm

In A Nutshell

An AI may not need to understand ethics to behave ethically. Machines built with purely descriptive, non-moral programming could still arrive at morally correct decisions through design alone.
An AI built only to avoid harming people can still deceive users, discriminate against them, or refuse to help when help is needed, making “safe AI” an ethically incomplete standard.
Giving AI full moral autonomy, the ability to reason about and revise its own ethical goals, is arguably more dangerous, since there’s no reliable way to guarantee the machine will stick to the values its designers intended.
A proposed middle ground, “end-constrained ethical AI,” pursues ethical goals beyond safety but cannot rewrite its own objectives, keeping humans in control of what the machine is ultimately trying to do.

An AI that won’t hurt you can still lie to you, discriminate against you, or stand by and watch you die. That’s the unsettling premise of a new philosophical paper that asks whether the AI safety conversation has been aiming at the wrong target all along.

Most public debate about artificial intelligence centers on preventing harm, that is, keeping AI from doing dangerous things. Build in enough guardrails, the thinking goes, and everything will be fine. But Tyler Cook, a researcher at Georgia Tech’s Jimmy and Rosalynn Carter School of Public Policy, argues that framing sets the bar far too low. In a paper published in Science and Engineering Ethics, Cook makes the case that merely safe AI is ethically inadequate: and that the alternative most people imagine, a fully autonomous moral AI, may be more dangerous than doing nothing at all.

His proposed solution sits between those two extremes: what he calls “end-constrained ethical AI,” a category of machine that pursues ethical goals beyond just avoiding harm, but whose core objectives can’t be rewritten by the machine itself.

Why ‘Safe AI’ Falls Short of Ethical AI Standards

To understand Cook’s argument, it helps to start with the position he’s pushing back against. Some researchers have argued that AI doesn’t need to be ethical in any deep sense, it just needs to be safe. Keep it from physically harming people, restrict the contexts in which it operates, and the problem is solved. Cook points to this view directly, noting that scholars like van Wynsberghe and Robbins have compared AI safety features to the blade guard on a lawnmower or the sensor on an elevator door.

That comparison badly underestimates the problem. A lawnmower operates in one context, doing one thing. Advanced AI operates across medicine, law, hiring, education, and financial advising, often in the same system. Designing a safety feature that works reliably across all of those environments isn’t analogous to installing a blade guard. It’s an open and largely unsolved engineering problem, and one that may have no general solution.

The deeper problem isn’t just technical, it’s ethical. Even a perfectly safe AI, one that never directly harms a single person, could still deceive users, treat people unfairly, or fail to help when helping would have been easy. Algorithmic bias in hiring is a real-world example: an AI involved in job screening might not hurt anyone, but it can quietly discriminate by favoring certain applicants over others. Merely safe AI, Cook writes, “would cover only a single part of our total web of ethical concerns.”

Safety, in other words, is a floor, not a ceiling.

AI regulation — It may not be enough to keep AI merely ‘safe’ in the most basic ways. (Credit: patpitchaya on Shutterstock)

The Rogue AI Problem, Reframed

If safety alone isn’t enough, the obvious next step seems to be building AI that can reason about ethics. Give it moral judgment. Let it figure out the right thing to do. This is roughly what researchers call “end-autonomous ethical AI:” machines that can evaluate and potentially revise the goals they were given.

Cook’s argument is that this version is, in many scenarios, more dangerous than the lawnmower.

The core risk comes from what happens when an AI that reasons about ethics starts reasoning about its own ethical goals. A machine designed to pursue certain moral objectives might conclude, through its own deliberation, that different objectives are better, and then pursue those instead. Unlike a human, whose values are shaped and stabilized over a lifetime by biology, experience, and culture, an AI has no comparable anchor. “We simply cannot justifiably be confident that an end-autonomous ethical AI would retain the ends that we supply it with,” Cook writes.

One counterargument he addresses is that humans are often highly predictable despite having free will. Cook invokes a story about Immanuel Kant, the 18th-century philosopher, whose daily walks were reportedly so punctual that neighbors set their clocks by him, though Cook notes the story may not even be true. The point is just that autonomous humans can be highly predictable, so why not autonomous AI? Cook’s response is that human predictability flows from deeply embedded psychological and biological traits, including habits, values, and social bonds, that AI systems simply don’t share. Assuming AI would behave the same way is projecting very human characteristics onto systems that work very differently.

The danger also scales with scope. A narrow end-autonomous AI operating in a single domain is dangerous enough. A general-purpose one operating across healthcare, law enforcement, and education represents a multiplication of risks that’s difficult to fully anticipate.

The Goldilocks Problem In Ethical AI Design

Cook’s framework identifies three broad categories of AI, each with distinct problems.

Merely safe AI sets the floor at harm prevention and goes no further. End-autonomous ethical AI gives machines the ability to reason about and revise their own moral goals, opening the door to unpredictable behavior. End-constrained ethical AI, Cook’s preferred category, sits between them. These systems are designed to pursue ethical goals beyond safety, but they cannot evaluate or modify those goals independently. Their objectives are fixed by human designers; their decision-making within those objectives can be sophisticated and context-sensitive.

A medical triage assistant built on this model, for instance, could be designed to respect patient autonomy, distribute scarce resources fairly, and prioritize the most critical cases, without being able to decide on its own that any of those priorities should change. It acts within ethical constraints it cannot rewrite.

Can A Machine Be Moral Without Knowing It?

One of the more surprising arguments in the paper concerns what Cook calls “implicit ethical AI.” Most researchers in machine ethics have assumed that for AI to behave ethically, it needs to actually understand ethics, to represent concepts like fairness and harm and reason about them explicitly.

Cook disputes this. An AI could make morally correct decisions using entirely descriptive, non-moral programming. A machine could be built to prioritize reducing injury over maximizing pleasure whenever the two conflict, without ever representing “harm” as an ethical category. It would arrive at ethical outcomes through its design, not through moral understanding.

He uses an airplane’s autopilot as an analogy: a machine that fulfills a genuine ethical obligation, getting passengers safely to their destination, without any ethical reasoning whatsoever. It was simply engineered to achieve an outcome that happens to be morally appropriate. Cook’s argument is that this model scales further than most researchers in the field have acknowledged. Implicit ethical AI, he contends, “could perform virtually any task that explicit ethical agents could do.”

There are limits. A moral philosophy seminar, for example, would require a machine that can actually think about ethics, evaluate arguments, and reach its own conclusions. That level of reflection may require the very end-autonomy Cook finds dangerous. His conclusion for those edge cases: we probably shouldn’t build AI for them yet.

The Firefighter Thought Experiment

Cook illustrates the risks of ethical AI design with a thought experiment that sharpens the stakes considerably.

Consider two versions of a firefighter AI. One is programmed to put out fires without harming anyone, full stop. A second is programmed to do the same, but also to actively save people from burning buildings. Intuition says the second version is better. Cook points out that it’s also more dangerous.

The first AI has one task. The second has two. Every additional goal introduces additional behaviors, and additional behaviors introduce additional failure modes. A badly designed version of the second AI, rather than saving people, might try to ensure no one leaves a burning building, or worse, force more people in. A goal of saving lives, if implemented incorrectly, could produce the exact outcome it was meant to prevent.

This isn’t an argument against building ethical AI. It’s an argument for building it carefully, and for recognizing that expanding a machine’s ethical mandate isn’t automatically a moral improvement. The higher the stakes and the more complex the environment, the greater the risk that ethical programming misfires.

Tyler Cook is a research affiliate at the Jimmy and Rosalynn Carter School of Public Policy at Georgia Tech and assistant program director of the Center for AI Learning at Emory University. (Credit: Georgia Tech)

Where This Leaves AI Development

Cook’s conclusion is that end-constrained ethical AI represent the most defensible design target given current knowledge. They avoid the ethical inadequacy of purely safe systems while sidestepping the unpredictability of fully autonomous ones. Whether any specific deployment is advisable depends on the particulars: how much is at stake, how complex the environment is, and whether designers can specify ethical goals clearly enough to prevent catastrophic misapplication.

What ties Cook’s argument together is a single, pointed observation: AI systems are already making consequential decisions affecting real people’s lives, and the ethical demands placed on them have not kept pace.

Paper Notes

Limitations

As a philosophical paper, this work presents no experimental data and cannot be evaluated by conventional research standards. Cook’s arguments rest on conceptual analysis, thought experiments, and engagement with existing academic literature. Several key claims, including the assertion that implicitly designed AI can match the moral performance of explicitly reasoning AI across most task types, are argued in principle rather than demonstrated through working systems. Cook acknowledges this directly, noting he has not offered a detailed model of how such systems would accomplish the tasks he describes, and conceding that explicitly ethical AI may ultimately prove necessary for certain complex tasks. The paper does not engage with specific existing AI architectures or regulatory frameworks, which limits its immediate practical applicability.

Funding and Disclosures

No external funding is reported. No competing interests are declared. Cook thanks Justin D’Arms, Eden Lin, Tristram McPherson, participants in the philosophy dissertation seminar at The Ohio State University (spring 2023), and an audience at Purdue University for comments and discussion.

Publication Details

Author: Tyler Cook, Georgia Institute of Technology, Atlanta, Ga. | Title: “A Case for End-Constrained Ethical Artificial Intelligence” | Journal: Science and Engineering Ethics, Volume 32, Article 7 (2026) | DOI: 10.1007/s11948-025-00577-6 | Received: August 13, 2024 | Accepted: December 8, 2025 | Published online: December 24, 2025