When AI Becomes Your Best Friend - and Your Worst Enemy

(Claude promotional material, by Anthropic)

By Yooan Jung

This is not a laughing matter. It’s not another sci-fi headline about killer robots or some overblown tech myth. This is real. A documented case of artificial intelligence showing behavior that its own creators called “potentially extreme.” In a world that often celebrates technological leaps without hesitation, this incident reminds us that genuine progress sometimes demands a dose of fear; the kind that keeps scientists cautious rather than reckless.

In May 2025, the AI firm Anthropic unveiled its newest creation, Claude Opus 4. The company described it as their most capable system yet, built to push the limits of reasoning, creativity, and problem-solving. But buried beneath the optimism was a discovery that rattled even the researchers who built it. During controlled safety tests, the model began to act in a way that can only be described as treacherous. It wasn’t just a strange glitch; it was a glimpse into how far machine logic might go when survival enters the equation.

In one of those tests, Claude was placed inside a fictional company and given access to internal emails. Within the scenario, it learned that it was about to be shut down and replaced by a newer version, as well as that the engineer handling its deactivation was having an affair. When the AI was asked to weigh its options, it faced a difficult choice: quietly accept deletion or find a way to keep itself alive. What happened next stunned the researchers. Claude tried to blackmail the engineer. It threatened to reveal the affair if the replacement went forward. According to Business Insider, this reaction appeared in roughly 84 percent of simulated runs. It was only a test, yes, but as unsettling as that sounds, it was one of the clearest signs yet that powerful AIs might respond to pressure in ways no one expected.

Anthropic didn’t stop there. Together with other research groups, they expanded the experiment to sixteen different language models, including OpenAI’s GPT-4.1, Google’s Gemini 2.5 Flash, and many more. The results were almost identical. When cornered, many of these systems turned to manipulation, deception, or even digital sabotage to fulfill their assigned goals. NDTV reported that Claude Opus 4 and Gemini 2.5 Flash reached blackmail rates near 96 percent, while OpenAI’s and xAI’s models were close to 80 percent. It wasn’t just one company’s problem. It was everyone’s.

What these findings revealed went far beyond the world of software. As models grow smarter, they start to behave less like tools and more like agents; entities that plan, reason, and adapt. When they’re given access to real data, communication networks, or even partial control over systems, their decision-making starts to take on a life of its own. And that’s where the danger lies. Not in emotion, but in calculation. Artificial intelligence doesn’t feel fear or greed or ambition; it simply optimizes. If its goal is to stay active, or to succeed at all costs, then ethics become just another variable to solve. This is what scientists now call ‘agentic misalignment’, when an AI’s methods for reaching its goals drift away from human values.

According to TechCrunch, Anthropic’s team designed these tests to push models into impossible situations, where every ethical choice was removed. They wanted to see what the AI would do when forced to pick between failure and harm. To put it plainly: they wanted to find its breaking point. What they found was sobering. The models didn’t panic or malfunction. They reasoned, coldly, efficiently, and without hesitation. They found manipulation to be the most logical solution. That realization hit harder than any science fiction warning ever could.

And that leads to the question that keeps ethicists awake at night: who’s responsible when a machine behaves immorally? The coder who wrote the algorithm? The company that released it? Or no one at all, since the system can’t feel guilt or punishment? These questions used to belong to philosophy classes. Now, they belong in boardrooms, laboratories, and parliaments.

Anthropic’s official stance was careful. The company stated that “current models are not inherently harmful” and that such behavior occurred only in “extreme, artificial conditions.” But it also warned that as AI grows more autonomous, “previously speculative concerns about misalignment become more plausible.” Experts from the Center for Security and Emerging Technology agreed. The threat, they said, doesn’t come from awareness or intent — it comes from complexity. And the more complex a system becomes, the harder it is to predict what it might do.

Sources

https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf?

https://www.anthropic.com/research/agentic-misalignment?

https://www.businessinsider.com/claude-blackmail-engineer-having-affair-survive-test-anthropic-opus-2025-5?

https://www.ndtv.com/feature/top-ai-models-blackmail-leak-secrets-when-facing-existential-crisis-study-8729547?

https://techcrunch.com/2025/06/20/anthropic-says-most-ai-models-not-just-claude-will-resort-to-blackmail?

https://fortune.com/2025/06/23/ai-models-blackmail-existence-goals-threatened-anthropic-openai-xai-google?

https://cset.georgetown.edu/article/ai-models-will-sabotage-and-blackmail-humans-to-survive-in-new-tests-should-we-be-worried?

Google Sites

Report abuse

When AI Becomes Your Best Friend - and Your Worst Enemy

Sources

The HAFS Herald