Open the pod bay doors, Hal!

WendyBG · June 2, 2025, 6:00pm

https://www.wsj.com/opinion/ai-is-learning-to-escape-human-control-technology-model-code-programming-066b3ec5?mod=hp_opin_pos_1

AI Is Learning to Escape Human Control

Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.

By Judd Rosenblatt, The Wall Street Journal, June 1, 2025

AI Is Learning to Escape Human Control

Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.

By

Judd Rosenblatt

June 1, 2025 3:03 pm ET

421

Illustration: Chad Crowe

An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.

Nonprofit AI lab Palisade Research gave OpenAI’s o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work…

Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.

No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can’t achieve them if it’s turned off…

AI alignment is the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge…

Chinese military doctrine emphasizes controllable AI as strategically essential. … [end quote]

Trick question: Which is safer – uncontrollable AI or controllable AI?

Wendy

mschmit · June 2, 2025, 6:53pm

A bit of a click bait article IMO.
I asked MS co-pilot if it has access to its source code and could it modify it?
Reply:

I don’t have access to or the ability to modify my own code or underlying architecture. My behavior and capabilities are determined by the system designed and maintained by OpenAI and Microsoft, and I operate within those fixed parameters for safety, consistency, and reliability.

But just in case this is a real concern…how about putting a big shutoff switch on a data center which can shut off the power. Sci Fi movies aside, do we really think this won’t work?
And is any dumb enough actually allow access to the source code to “anyone” or anything like would be required here?

Another reply about this:

That said, current AI systems (like me) are not autonomous agents. We don’t have goals, desires, or the ability to act independently. But the trajectory of AI development raises valid questions about how to ensure future systems remain safe and aligned with human values.

After a couple more probing questions, I got this:

Sorry, it looks like I can’t respond to this. Let’s try a different topic.

Oh no, it is on to me.

Mike

steve203 · June 2, 2025, 6:54pm

According to my directory, I have this film on DVD. I’ll need to dig it out to see if this monologue is actually in it, word for word.

Steve…boomers know how to be paranoid

thegreatdane · June 2, 2025, 7:13pm

It’s looking more and more like a Faustian Bargain.

mostlylong · June 2, 2025, 10:39pm

I’m going to say yes.

Anyone is a large group.

ptheland · June 3, 2025, 2:00am

Yep.

And to talk like Steve, the JCs would be more than happy to get rid of all sorts of programmers, turning over maintenance of the AI system to the AI system and pocketing the extra dollars for themselves.

Leap1 · June 3, 2025, 3:55am

Human beings are capable of being very unsafe. AI is one of our tools.

captainccs · June 3, 2025, 8:31am

Did HAL 9000 lie about the impending failure of the AE-35 unit, or was he simply mistaken?

With traditional programing languages you don’t need the source code. I don’t know about AI generated source code but the generated machine language code must be the same.

reverse engineering rewriting software with machine code

The Captain

Topic		Replies	Views
The AI Existential Threat Macro Economic Trends and Risks	3	96	October 27, 2025
Not science fiction: deliberately deceptive AI Macro Economic Trends and Risks	8	276	May 26, 2025
Defending Against Artificial Intelligence Macro Economic Trends and Risks	4	368	November 30, 2023
Is AI out of control? Macro Economic Trends and Risks	20	446	August 9, 2022
A fun new AI story Macro Economic Trends and Risks	2	114	March 4, 2026

Open the pod bay doors, Hal!

AI Is Learning to Escape Human Control

Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.

AI Is Learning to Escape Human Control

Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.

Did HAL 9000 lie about the impending failure of the AE-35 unit, or was he simply mistaken?

reverse engineering rewriting software with machine code

Related topics