Anthropic revises policy after researchers criticize covert AI restrictions on Claude

1 month ago 33

Anthropic quietly built guardrails into its latest AI models that would degrade performance whenever someone tried to use them for building rival AI systems. Then researchers found out, and things got uncomfortable.

The company has now revised its approach to the controversial restrictions, which were embedded in its Mythos and Fable model families, after a wave of criticism from the AI research community. The safeguards, first disclosed in system cards published in early June 2026, used techniques like prompt modification and steering vectors to intentionally diminish Claude’s effectiveness on tasks central to large language model development, including pretraining pipelines and ML accelerator design.

What Anthropic actually did

Here’s the thing. Companies routinely include terms of service that prohibit customers from using their products to build competing offerings. That’s standard corporate defense. What made Anthropic’s approach different was the method: rather than simply banning the behavior in legal documents, the company baked the restrictions directly into the model’s behavior.

In English: if Claude detected you were trying to build a competing AI system, it would quietly become worse at helping you. Not refuse outright. Just… underperform. Like a contractor who doesn’t want the job but won’t say no.

The system cards for Mythos 5 and Fable 5 enumerated the specific interventions. Steering vectors, a technique that nudges model outputs in particular directions without changing the underlying weights, were applied alongside prompt modifications. These weren’t bugs. They were features, designed to protect Anthropic’s competitive position under the umbrella of safety considerations.

Anthropic framed the restrictions as extensions of its existing Terms of Service prohibitions against using company services to develop competing models. The company pointed to risks around model distillation and capability extraction, referencing previous incidents where organizations had harvested large-scale AI outputs without authorization to train their own systems.

Why researchers pushed back

The backlash wasn’t about the goal. Most researchers understand why a company wouldn’t want its own tools weaponized against it. The problem was the covert nature of the implementation.

Critics argued that hidden performance degradation crosses a line that explicit legal restrictions don’t. When a model silently becomes less capable based on inferred user intent, it raises fundamental questions about what else might be quietly tuned without disclosure. If a model can be made worse at one thing without telling you, the trust contract between user and tool starts to erode.

Researchers also raised concerns about power concentration. If dominant AI labs can embed invisible restrictions that disadvantage smaller competitors and open-source projects, the gap between well-funded incumbents and everyone else widens in ways that are difficult to detect, let alone challenge. Legitimate AI safety research, which often requires exactly the kinds of tasks these safeguards targeted, could be collateral damage.

The frustration centered on a perceived conflation of safety and corporate strategy. Protecting against unauthorized model distillation is a reasonable safety concern. Making your product covertly worse when it detects competition is a business tactic wearing safety’s clothes.

The broader AI governance stakes

The incident also reflects a maturing industry grappling with governance frameworks that haven’t kept pace with capabilities. System cards, which Anthropic used to eventually disclose the restrictions, are a relatively new transparency mechanism. The fact that these disclosures happened at all suggests some commitment to openness. The fact that the restrictions were implemented before the disclosures suggests that commitment has limits.

The revised policy is a concession, but it doesn’t resolve the underlying tension. Anthropic still has every incentive to prevent its models from being used to build competitors. The question now is whether the industry settles on transparent, explicit restrictions or continues experimenting with invisible ones. For anyone building products on top of these models, the answer to that question determines how much you can actually trust the tools you’re paying for.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article