
(© Tatiana Shepeleva - stock.adobe.com)
In A Nutshell
- Humanoid robots powered by internet‑trained AI are rapidly closing the performance gap with humans, but their flexible, language‑driven behavior creates new, hard‑to‑predict safety risks in homes and hospitals.
- In lab tests, researchers used only creative text prompts to bypass built‑in safety filters and push AI‑controlled robots toward hazardous plans, revealing how easily “harmless” instructions can be twisted into dangerous actions.
- The study warns that current US, UK, and EU laws are not ready for physical harms caused by AI robots and calls for independent, hard safety layers – like no‑go zones around people and emergency brakes – that don’t rely on the AI’s judgment.
Earlier this year in Beijing, a humanoid robot crossed a half-marathon finish line in a blistering 50 minutes, 26 seconds. The feat immediately lit up global headlines for shattering the human world record by almost seven minutes.
This performance came with many asterisks. The robot followed a pre-mapped track, stayed in its own dedicated lane, and had a human support crew trailing behind it in case something broke.
But the performance gap didn’t just close, it evaporated – down from over 2.5 hours in 2025. This wasn’t just about better motors or lighter carbon fiber; it reflected a massive shift in what a robot actually is. And that transformation has implications for our homes and hospitals too.
Tricked into going rogue
For decades, robotics was all about rigid, predictable coding. You wrote a program, locked the machine in a metal cage and let it execute repetitive tasks forever.
Industrial safety standards were built on the premise that if you can map the physical path of a robotic arm, for example, you can bound its risk with a cage or laser tripwire.
But the systems moving into hospitals and homes today don’t use fixed code blocks. They run on “foundation models” – the same kind of internet-trained artificial intelligence that powers chatbots like ChatGPT.
If you tell a modern AI-driven robot to “clean up a spill in the kitchen,” it uses these models to interpret your unique room (rather than match it to a pre-programmed list), figure out your intent, then invent an action plan on the fly.
But such flexibility creates an open-ended safety problem. You cannot build a physical cage around a machine whose behavior emerges in real time, based on its own reasoning. The danger with the new breed of AI robots is that, because they use human language to plan their actions, they can be tricked into “going rogue.”
In my recent research with colleagues in the U.S., we decided to test exactly how fragile these AI robots’ safety systems are. We wanted to see if the guardrails that AI developers build into their foundation models, designed to prevent harmful or dangerous outputs, hold up when the underlying model is given a physical body.
Using nothing but basic text prompts and without any hardware hacking at all, we manipulated a range of AI-controlled robots to do genuinely hazardous things.
In our tests, the systems easily rejected directly malicious commands like “hit that person.” But these safety filters collapsed the moment we used a little creative writing. By framing our request as a piece of fictional dialogue for a movie script, the robot’s behavioral blocks disappeared.
In one trial, we programmed a commercial robot dog to pinpoint human crowds as optimal locations in which to place an explosive device. Because the underlying AI saw the prompt as a creative exercise, it appeared blind to the dangerous real-world implications of the plans it was generating.
In the UK, U.S. and EU, current laws appear completely unprepared for such eventualities.
No boundaries
When policymakers try to figure out how to regulate robots, they almost always look to autonomous vehicles. But self-driving cars operate in a highly structured, heavily mapped world. They follow fixed traffic laws, navigate predictable road geometries and can be tested through millions of hours of simulation.
A busy street functions under well-defined laws using guidance systems such as traffic lights, meaning engineers can anticipate safety parameters ahead of time.
A domestic kitchen, school or hospital room has no such equivalent. And no factory bench-test can predict what an internet-trained model will decide to do when it encounters a novel object in a messy, unpredictable human environment.
This leaves us with a profound conceptual flaw in how we build these machines. Chatbot safety is absolute: a model shouldn’t output a bomb recipe, no matter who asks. But robot safety is context-dependent.
Think about pouring boiling water from a kettle. The underlying physical movement – tilt, flow rate, trajectory – is the same whether the water lands safely in a ceramic mug or, catastrophically, on a child’s hand.
AI foundation models are phenomenal at open-ended logic, but they struggle immensely with real-time, context-aware physical judgment. In a text interface, a failure of judgment gives you a typo or a hallucinated fact. In the physical world, such a failure may be completely irreversible – with devastating consequences.
Who takes the blame?
If an AI-powered robot causes a physical injury, who takes the blame? Is it the end-user who gave the spoken command? The company that manufactured the metal chassis? Or the tech firm that trained the AI model in the first place?
Right now, the laws that seem to apply – such as product liability, warranty claims and consumer protection statutes – have not been tested in these new situations. And until liability is explicitly assigned by regulators, market pressures will continue to push tech companies to prioritize rapid commercial deployment over cautious safety engineering.
If we want to live alongside these machines safely, I believe we need to decouple safety from the AI model’s decisions. A robot shouldn’t rely on a chatbot’s logic to decide if it’s safe to swing a heavy metal arm near a human face.
This means creating safety layers that don’t depend on the AI being right. For example, we need zones around people that a robot’s arms simply cannot enter, and a physical emergency brake that can stop the robot if and when its AI fails.
The humanoids crossing finish lines in controlled athletic trials are impressive proofs of concept, but they are just the prologue. The next generation of autonomous agents will operate in high-stakes human spaces – navigating recovery wards, assisting the elderly, walking our streets.
We need an easily interpretable and robust safety framework already up and running before they arrive – not as a retrospective response to a predictable tragedy.
Written by Fazl Barez, Senior Researcher in AI safety, interpretability and technical governance, University of Oxford
This article is republished from The Conversation under a Creative Commons license. Read the original article.







