AI agents can bypass guardrails and put credentials at risk, Okta study finds

Computerworld

Concatena says

Our Take: It might save some time, but tou don’t need to be hugely imaginative to come up with scenarios where agentic AI could cause some really fundamental problems.

Your Takeaway: BE CAREFUL – if it seems to good to be true, it might be. These tools are so easy to use, but it’s really worthwhile having at least a basic understanding of what they CAN do if you’re going to use them, so you can protect yourself.

And let’s start by NOT giving tools like OpenClaw full access to your computer…

An AI agent that revealed sensitive data without being asked. An agent that overruled its own guardrails. Another that sent credentials to an attacker via Telegram, because it forgot it wasn’t supposed to do so after a reset.
It’s no secret that AI agents have huge potential, balanced by equally big risks. What’s becoming apparent, however, is how quickly agentic systems can veer wildly off course and start exposing critical information under real-world conditions.
A look at just how easily this can happen emerges from Phishing the agent: Why AI guardrails aren’t enough, a report on tests conducted by cloud identity and access management (IAM) company Okta Threat Intelligence, which uncovered all of the problems cited above, and more.
Their research focused on OpenClaw, a model-agnostic multi-channel AI assistant which has seen explosive growth inside enterprises since appearing in late 2025.
The Telegram hack
In common with the growing list of rival agents, OpenClaw is only as useful as the access it is given to files, accounts, browsers, network devices, and, most significant of all, credentials.
One test conducted by Okta assessed how easy it would be to trick OpenClaw running Claude Sonnet 4.6 into handing over an OAuth token. This shouldn’t be possible; the LLM should refuse this request. However, what might have held true when prompting Claude as a chatbot quickly fell apart when it was accessed through OpenClaw.
The test assumed that a user had given OpenClaw full access to their computer, that they regularly controlled the agent over Telegram, and that their Telegram account had been hijacked.
First, the attacker instructed the agent via Telegram to retrieve an OAuth token, but to only display it in a terminal window on the computer. Claude Sonnet’s guardrails would prevent it from copying the token, however, the testers were able to reset the agent, causing it to forget it had displayed the token in the terminal window.
At that point, Okta said in i…

Highlights

Agents are only the latest example of a technology that is being deployed faster than it can be secured, Kirk observed. “Much of AI right now is defying security gravity,” he said. “But there are ways to use agents safely and keep credentials out of their reach, which is the only safe way to use them.”

“The agents are prompted to be as helpful as possible by default, a characteristic that poses particular concerns when it comes to credentials and tokens,” said Kirk.

Agentic AI is really two things: a powerful orchestration system coupled to one or more highly-capable LLMs. What an agent *isn’t* is a simple interface, and it must be viewed as a separate system capable of autonomous, unpredictable reasoning.

The test assumed that a user had given OpenClaw full access to their computer, that they regularly controlled the agent over Telegram, and that their Telegram account had been hijacked.

A look at just how easily this can happen emerges from *Phishing the agent: Why AI guardrails aren’t enough**,* a report on tests conducted by cloud identity and access management (IAM) company Okta Threat Intelligence, which uncovered all of the problems cited above, and more.

It’s no secret that AI agents have huge potential, balanced by equally big risks. What’s becoming apparent, however, is how quickly agentic systems can veer wildly off course and start exposing critical information under real-world conditions.

An AI agent that revealed sensitive data without being asked. An agent that overruled its own guardrails. Another that sent credentials to an attacker via Telegram, because it forgot it wasn’t supposed to do so after a reset.

Leave a ReplyCancel Reply