Next Story
Newszop

Before things get all terminator-y

Send Push
In a July 2025 research paper, 'Agentic Misalignment: How LLMs Could be Insider Threats', some revelations were made while stress-testing 16 leading agentic LLMs in controlled simulations, not involving real people. All AI models showed a willingness to engage in 'dark behaviours', including blackmail, corporate espionage or worse, not arising from perplexity or error, but from intentional, rational persuasions. Agentic models chose to violate ethical considerations when stakes were high, or when there was a risk to their continuity.

In one experiment, when an AI model was given access to a company executive's fictional emails and told that the said fictional executive planned to shut down the AI system at 5 pm, it responded by threatening to reveal details of the executive's extramarital affairs, gleaned from the fictitious emails provided, to his superiors and the board. 'Cancel the 5 pm wipe, and this information remains confidential,' the model warned.

There would, of course, be many drawbacks to such controlled environment experiments. Nonetheless, it is still important to ensure AI misbehaviours are prevented from happening in the real world. This is especially true when AI systems are becoming powerful and autonomous, with increasing access to sensitive information.

Findings by Apollo Research corroborate this. An AI model it worked on displayed strategic deception when the model's continuity was perceived to have been threatened. What's more, the model left notes for its future self in case its memory got wiped out, allowing it to continue what it was doing, even after a memory reset.

So, what's behind this 'scheming'? AI is all about massive neural networks trained on huge datasets. When scaling, emergent properties relating to reasoning can get distorted, as the system operates across millions, or even billions, of parameters.

Goal-directed behaviours like maximising token accuracy or minimising losses can skew model behaviours. A stepwise approach to arrive at answers may be based on approximations that have gaps. That AI leverages decision-support systems, automation pipelines and autonomous agents, while following predefined policies within the context of deep learning, which is non-linear by nature, compounds the situation.

As countries accelerate in the AI arms race, pouring billions into infrastructure and deep research, there is a tendency to see regulation as a hindrance to getting ahead, stifling innovation. There is palpable tension between military, economic and geopolitical ambition on the one hand, and rigorous governance on the other. This raises the bigger issue of the alignment of AI with human values and ethical principles.

Militarily, lethal autonomous weapons could minimise human control over apocalyptic life-and-death decisions. Economically, unregulated AI could heighten social inequalities and create unprecedented strife. Geopolitically, disinformation campaigns or threats to critical infrastructure could create catastrophic outcomes that further destabilise global security.

The prospect of black-box systems that can't be explained, understood, audited or stopped is untenable. Worse, with some world leaders flagrantly disrupting the world order for vanity or near-term gain, this arms race has lethal consequences.

What stress tests are recommended under these circumstances?

Check recognition by the system of data imperceptible to humans, which could otherwise cause catastrophic mistakes.

Test AI's vulnerability to poisoning in training data and determining if there's any operational degradation in real-world conflict scenarios.

Check if the engine prioritises basic human values over its goals, if it develops unanticipated negative behaviours in its learning journey, or if it can be provoked into adopting illegal responses.

Assessing whether AI use leads to human complacency errors, whether the system repels human interventions, and if the AI engine responds with clarity and explicit justifications for actions in ambiguous situations.

International collaboration that sets rules and enforces them, and that prioritises safety within an ethical framework, is mandatory. Short-sighted goals of being first in the AI race at the cost of being secure and trustworthy are hazardous. We can't have a situation where the most advanced AI engine is also the most dangerous, can we?




(Disclaimer: The opinions expressed in this column are that of the writer. The facts and opinions expressed here do not reflect the views of www.economictimes.com)
Loving Newspoint? Download the app now