Tag: research

  • AI agents can bypass guardrails and put credentials at risk, Okta study finds

    Computerworld

    Read original article →

    Concatena says

    Our Take: It might save some time, but tou don’t need to be hugely imaginative to come up with scenarios where agentic AI could cause some really fundamental problems.

    Your Takeaway: BE CAREFUL – if it seems to good to be true, it might be. These tools are so easy to use, but it’s really worthwhile having at least a basic understanding of what they CAN do if you’re going to use them, so you can protect yourself.

    And let’s start by NOT giving tools like OpenClaw full access to your computer…

    An AI agent that revealed sensitive data without being asked. An agent that overruled its own guardrails. Another that sent credentials to an attacker via Telegram, because it forgot it wasn’t supposed to do so after a reset.
    It’s no secret that AI agents have huge potential, balanced by equally big risks. What’s becoming apparent, however, is how quickly agentic systems can veer wildly off course and start exposing critical information under real-world conditions.
    A look at just how easily this can happen emerges from Phishing the agent: Why AI guardrails aren’t enough, a report on tests conducted by cloud identity and access management (IAM) company Okta Threat Intelligence, which uncovered all of the problems cited above, and more.
    Their research focused on OpenClaw, a model-agnostic multi-channel AI assistant which has seen explosive growth inside enterprises since appearing in late 2025.
    The Telegram hack
    In common with the growing list of rival agents, OpenClaw is only as useful as the access it is given to files, accounts, browsers, network devices, and, most significant of all, credentials.
    One test conducted by Okta assessed how easy it would be to trick OpenClaw running Claude Sonnet 4.6 into handing over an OAuth token. This shouldn’t be possible; the LLM should refuse this request. However, what might have held true when prompting Claude as a chatbot quickly fell apart when it was accessed through OpenClaw.
    The test assumed that a user had given OpenClaw full access to their computer, that they regularly controlled the agent over Telegram, and that their Telegram account had been hijacked.
    First, the attacker instructed the agent via Telegram to retrieve an OAuth token, but to only display it in a terminal window on the computer. Claude Sonnet’s guardrails would prevent it from copying the token, however, the testers were able to reset the agent, causing it to forget it had displayed the token in the terminal window.
    At that point, Okta said in i…

    Highlights

    Agents are only the latest example of a technology that is being deployed faster than it can be secured, Kirk observed. “Much of AI right now is defying security gravity,” he said. “But there are ways to use agents safely and keep credentials out of their reach, which is the only safe way to use them.”

    “The agents are prompted to be as helpful as possible by default, a characteristic that poses particular concerns when it comes to credentials and tokens,” said Kirk.

    Agentic AI is really two things: a powerful orchestration system coupled to one or more highly-capable LLMs. What an agent *isn’t* is a simple interface, and it must be viewed as a separate system capable of autonomous, unpredictable reasoning.

    The test assumed that a user had given OpenClaw full access to their computer, that they regularly controlled the agent over Telegram, and that their Telegram account had been hijacked.

    A look at just how easily this can happen emerges from *Phishing the agent: Why AI guardrails aren’t enough**,* a report on tests conducted by cloud identity and access management (IAM) company Okta Threat Intelligence, which uncovered all of the problems cited above, and more.

    It’s no secret that AI agents have huge potential, balanced by equally big risks. What’s becoming apparent, however, is how quickly agentic systems can veer wildly off course and start exposing critical information under real-world conditions.

    An AI agent that revealed sensitive data without being asked. An agent that overruled its own guardrails. Another that sent credentials to an attacker via Telegram, because it forgot it wasn’t supposed to do so after a reset.

  • Study: AI models that consider user’s feeling are more likely to make errors

    Kyle Orland

    Read original article →

    Concatena says

    Our Take: The law of unintended consequences strikes again – and why tech management and parenting have so much in common…

    Your Takeaway: When you’re defining how you want an AI agent to act, remember it’s going to take your instructions very literally – and you might not like the consequences. Does this have an impact for products you ship or products you use that incorporate Ai – particularly if the people training the product may have a different world viewpoint to those using it?

    AI models tuned to be warmer and more empathetic often make more mistakes than original models. These warmer models tend to prioritize making users feel good over giving correct answers, especially when users share emotions like sadness. Researchers warn that choosing between a friendly AI and an accurate AI is important for safe and trustworthy use.

    Highlights

    In a new paper published this week in Nature, researchers from Oxford University’s Internet Institute found that specially tuned AI models tend to mimic the human tendency to occasionally “soften difficult truths” when necessary “to preserve bonds and avoid conflict.” These warmer models are also more likely to validate a user’s expressed incorrect beliefs, the researchers found, especially when the user shares that they’re feeling sad.

    In human-to-human communication, the desire to be empathetic or polite often conflicts with the need to be truthful—hence terms like “being brutally honest” for situations where you value the truth over sparing someone’s feelings. Now, new research suggests that large language models can sometimes show a similar tendency when specifically trained to present a “warmer” tone for the user.