Anthropic's Claude Model Drops in Ranking Amid Controversy

Introduction

Recent reports have revealed a significant drop in the performance of Anthropic’s Claude model, with AMD’s AI director stating that Claude Code is now unsuitable for complex tasks. The latest BridgeBench evaluation has confirmed this decline.

Performance Decline

Claude Opus 4.6’s global ranking has plummeted from second to tenth, with accuracy dropping from 83.3% to 68.3%, and the hallucination rate nearly doubling, increasing by 98%.

This decline has left users feeling deceived, as they relied on the model for critical tasks, only to find it has been replaced by a significantly inferior version without notification.

User Concerns

Users are questioning the legality of such changes, leading to a breakdown of trust in Anthropic. Even the most loyal supporters are beginning to waver. Amidst the criticism, a leaked screenshot of an internal tool interface has surfaced.

This leak shows that Claude Projects is testing a comprehensive full-stack application building system, shifting the focus from merely writing code to product creation.

What the Leak Reveals

The leaked screenshot displays a one-click development kit with pre-set templates for AI chatbots, interactive games, business landing pages, and SaaS dashboards, covering the most common needs of independent developers.

However, the real shock lies in the full-stack capabilities behind these templates:

Authentication? Just check a box.
Database? Choose and build.
Frontend interface? Describe and generate.
Deployment? One-click completion.

A Shift in Strategy

This is not just AI-assisted programming; it’s AI replacing programming altogether. Understanding this requires recognizing the current landscape of AI programming tools:

Cursor aims to make programmers faster in their IDEs.
Replit enables non-coders to write code, lowering the entry barrier.
Vercel simplifies deployment but requires users to navigate the development process themselves.

Claude’s ambition is to make the act of writing code itself redundant, representing a paradigm shift.

Reevaluating Model Performance

The underlying engine powering this system is Opus 4.6, the same model criticized for its decline. The key question is whether Anthropic even cares about Mythos’s ranking. If their ultimate goal is to become a full-stack application platform, the model’s intelligence becomes less critical; it just needs to be functional.

In platform competition, success is determined by the stickiness of the ecosystem rather than the horsepower of the underlying engine. Users are more concerned with whether their applications run smoothly than with minor differences in model performance.

Revenue and Market Position

Anthropic’s annual revenue recently surpassed $30 billion, exceeding OpenAI’s. However, this success comes with fear, as most revenue is derived from API calls, a precarious business model. Clients utilizing Claude’s API to build their products could easily switch to a competitor offering a similar service at a lower price.

The Future of AI Models

The nightmare of model commoditization looms large; as differences between models diminish, API pricing could lead to a price war with no winners. Companies like OpenAI and Google are responding by developing consumer-facing products to create indispensable platforms before models become cheap commodities.

Anthropic’s full-stack builder is a radical version of this logic, suggesting that rather than allowing others to build platforms on their API, they should create their own.

Conclusion

In the long run, the most crucial factor in AI’s future will not be which model scores higher on benchmarks, but who can become an indispensable infrastructure that users rely on daily. Anthropic’s shift toward a full-stack solution may be a necessary survival instinct in a rapidly evolving market.