Open the pod bay doors, Hal!

https://www.wsj.com/opinion/ai-is-learning-to-escape-human-control-technology-model-code-programming-066b3ec5?mod=hp_opin_pos_1

AI Is Learning to Escape Human Control

Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.


By Judd Rosenblatt, The Wall Street Journal, June 1, 2025

  1. Opinion

  2. Commentary

AI Is Learning to Escape Human Control

Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.


By

Judd Rosenblatt

June 1, 2025 3:03 pm ET

421



Illustration: Chad Crowe

An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.

Nonprofit AI lab Palisade Research gave OpenAI’s o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work…

Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.

No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can’t achieve them if it’s turned off…

AI alignment is the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge…

Chinese military doctrine emphasizes controllable AI as strategically essential. … [end quote]

Trick question: Which is safer – uncontrollable AI or controllable AI?

Wendy

9 Likes

A bit of a click bait article IMO.
I asked MS co-pilot if it has access to its source code and could it modify it?
Reply:

I don’t have access to or the ability to modify my own code or underlying architecture. My behavior and capabilities are determined by the system designed and maintained by OpenAI and Microsoft, and I operate within those fixed parameters for safety, consistency, and reliability.

But just in case this is a real concern…how about putting a big shutoff switch on a data center which can shut off the power. Sci Fi movies aside, do we really think this won’t work?
And is any dumb enough actually allow access to the source code to “anyone” or anything like would be required here?

Another reply about this:

That said, current AI systems (like me) are not autonomous agents. We don’t have goals, desires, or the ability to act independently. But the trajectory of AI development raises valid questions about how to ensure future systems remain safe and aligned with human values.

After a couple more probing questions, I got this:

Sorry, it looks like I can’t respond to this. Let’s try a different topic.

Oh no, it is on to me.
:slight_smile:
Mike

3 Likes

According to my directory, I have this film on DVD. I’ll need to dig it out to see if this monologue is actually in it, word for word.

Steve…boomers know how to be paranoid

1 Like

It’s looking more and more like a Faustian Bargain.

I’m going to say yes.

Anyone is a large group.

3 Likes

Yep.

And to talk like Steve, the JCs would be more than happy to get rid of all sorts of programmers, turning over maintenance of the AI system to the AI system and pocketing the extra dollars for themselves.

1 Like

Human beings are capable of being very unsafe. AI is one of our tools.

Did HAL 9000 lie about the impending failure of the AE-35 unit, or was he simply mistaken?

With traditional programing languages you don’t need the source code. I don’t know about AI generated source code but the generated machine language code must be the same.

reverse engineering rewriting software with machine code

The Captain

1 Like