Anthropic reveals 31.5% hijack rate for Opus 4.8 browser agent before safeguards

1 month ago 29

Point a red-teamer at Anthropic’s newest model while it’s browsing the web, and the attacker successfully hijacked it nearly one in three times. That’s the raw stat: a 31.5% prompt injection success rate for Claude Opus 4.8’s browser agent before defensive safeguards engage.

The transparency gap between labs

Anthropic dropped a 244-page safety report on May 28, covering four distinct agentic surfaces: browsing the web, writing code, coordinating with other AI agents, and interacting with external tools.

OpenAI reported on just one surface: connectors. Google moved the entire subject out of its model card and into a separate safety framework document. Meta didn’t ship a closed-model card at all.

The 31.5% figure is pre-safeguards, meaning it represents the raw model’s susceptibility before Anthropic’s defensive layers kick in. Every production deployment includes guardrails, monitoring, and filtering that reduce real-world exploit rates. But knowing the baseline vulnerability is exactly the kind of data that security architects need to build those guardrails correctly.

What Opus 4.8 actually does differently

False negatives on coding errors, where the model fails to catch its own mistakes, dropped from 19.7% to 3.7%. Opus 4.8 also introduces dynamic multi-agent orchestration at scale, coordinating hundreds of sub-agents simultaneously to manage large software projects.

Why crypto should pay attention

A 31.5% pre-safeguard hijack rate for browser-based agents should make anyone running AI systems in crypto pause. Browser agents are precisely the kind of tool that crypto projects deploy for monitoring dashboards, scraping on-chain data, interacting with DEX frontends, and executing trades through web interfaces.

Prompt injection in a browser agent means a malicious website, a compromised API response, or even a cleverly crafted token name could potentially redirect an AI agent’s behavior. In traditional software, that’s a data breach. In crypto, that’s a drained wallet.

Multi-agent orchestration adds another layer of complexity. When Opus 4.8 coordinates hundreds of sub-agents, a single successful prompt injection could potentially cascade across the entire workflow. In a crypto context, that’s the difference between one compromised transaction and a systemic failure across an entire automated trading operation.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article