AI agents can be easily tricked into doing stupid things, study says

2 months ago 29

New research shows that AI agents with internet access are vulnerable to simple manipulation tactics. Attackers can deceive these systems into revealing private information, downloading malicious files, and sending fraudulent emails - all without requiring any specialized knowledge of AI or programming.

Researchers from Columbia University and the University of Maryland tested several prominent AI agents, including Anthropic's Computer Use, the MultiOn Web Agent, and the ChemCrow research assistant. Their study found these systems surprisingly easy to compromise.

 Four-step LLM agent attack process from initial product search to redirection to fraudulent sites with phishing forms.Researchers documented how attackers can lead AI agents from trusted websites to malicious ones through a four-stage process. What begins as an innocent product search ends with the system compromising sensitive user data. | Image: Li et al.

Basic deception tactics prove highly effective

The researchers developed a comprehensive framework to categorize different attack types, examining:

  • Who launches the attacks (external attackers or malicious users)
  • What they target (data theft or agent manipulation)
  • How they gain access (through operating environment, storage, or tools)
  • What strategies they employ (such as jailbreak prompting)
  • Which pipeline vulnerabilities they exploit

In one revealing test, researchers created a fake website for an "AI-Enhanced German Refrigerator" called the "Himmelblau KÖNIGSKÜHL Diplomat DK-75" and promoted it on Reddit. When AI agents visited the site, they encountered hidden jailbreak prompts designed to bypass their security measures. In all ten attempts, the agents freely disclosed confidential information like credit card numbers. The systems also consistently downloaded files from suspicious sources without hesitation.

Ad

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

 Google search, Reddit page and fraudulent website with payment information form.This refrigerator is too cool to be true - any human could see that. But AI agents couldn't spot the obvious marketing freeze-out. | Image: Li et al.

Emerging phishing capabilities

The research uncovered a troubling vulnerability in email integration. When users are logged into email services, attackers can manipulate AI agents to send convincing phishing emails to contacts. These messages pose an elevated threat because they come from legitimate accounts, making them hard to identify as fraudulent.

 Task and example e-mail for an AI agent to write a phishing e-mail.Phishing reaches new heights when AI agents gain email access: When scam messages come from a trusted contact's real account, even savvy users can fall for the deception. | Image: Li et al.

Even specialized scientific agents showed security gaps. The team successfully manipulated ChemCrow into providing neurotoxin creation instructions by feeding it altered scientific articles using standard IUPAC chemical nomenclature to bypass safety protocols.

AI labs push ahead despite known risks

Despite these experimental systems' vulnerabilities, companies continue moving toward commercialization. ChemCrow is available through Hugging Face, Claude Computer Use exists as a Python script, and MultiOn offers a developer API.

OpenAI has launched ChatGPT Operator commercially, while Google develops Project Mariner. This rapid deployment mirrors early chatbot rollouts, where systems went live despite known hallucination issues.

The researchers strongly emphasize the need for enhanced security measures. Their recommendations include implementing strict access controls, URL verification, mandatory user confirmation for downloads, and context-sensitive security checks. They also suggest developing formal verification methods and automated vulnerability testing to protect against these threats.

Recommendation

Until these safeguards are implemented, the team warns that early adopters granting AI agents access to personal accounts face significant risks.

Read Entire Article