AI agents can be easily tricked into doing stupid things, study says

2 months ago 29

New research shows that AI agents with internet access are vulnerable to simple manipulation tactics. Attackers can deceive these systems into revealing private information, downloading malicious files, and sending fraudulent emails - all without requiring any specialized knowledge of AI or programming.

Researchers from Columbia University and the University of Maryland tested several prominent AI agents, including Anthropic's Computer Use, the MultiOn Web Agent, and the ChemCrow research assistant. Their study found these systems surprisingly easy to compromise.

Four-step LLM agent attack process from initial product search to redirection to fraudulent sites with phishing forms.

Basic deception tactics prove highly effective

The researchers developed a comprehensive framework to categorize different attack types, examining:

Who launches the attacks (external attackers or malicious users)
What they target (data theft or agent manipulation)
How they gain access (through operating environment, storage, or tools)
What strategies they employ (such as jailbreak prompting)
Which pipeline vulnerabilities they exploit

In one revealing test, researchers created a fake website for an "AI-Enhanced German Refrigerator" called the "Himmelblau KÖNIGSKÜHL Diplomat DK-75" and promoted it on Reddit. When AI agents visited the site, they encountered hidden jailbreak prompts designed to bypass their security measures. In all ten attempts, the agents freely disclosed confidential information like credit card numbers. The systems also consistently downloaded files from suspicious sources without hesitation.

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

Google search, Reddit page and fraudulent website with payment information form.

Emerging phishing capabilities

The research uncovered a troubling vulnerability in email integration. When users are logged into email services, attackers can manipulate AI agents to send convincing phishing emails to contacts. These messages pose an elevated threat because they come from legitimate accounts, making them hard to identify as fraudulent.

Task and example e-mail for an AI agent to write a phishing e-mail.

Even specialized scientific agents showed security gaps. The team successfully manipulated ChemCrow into providing neurotoxin creation instructions by feeding it altered scientific articles using standard IUPAC chemical nomenclature to bypass safety protocols.

AI labs push ahead despite known risks

Despite these experimental systems' vulnerabilities, companies continue moving toward commercialization. ChemCrow is available through Hugging Face, Claude Computer Use exists as a Python script, and MultiOn offers a developer API.

OpenAI has launched ChatGPT Operator commercially, while Google develops Project Mariner. This rapid deployment mirrors early chatbot rollouts, where systems went live despite known hallucination issues.

The researchers strongly emphasize the need for enhanced security measures. Their recommendations include implementing strict access controls, URL verification, mandatory user confirmation for downloads, and context-sensitive security checks. They also suggest developing formal verification methods and automated vulnerability testing to protect against these threats.

Recommendation