Anthropic has just officially released Claude Sonnet 4.5, and the results are jaw-dropping.
Claude Sonnet 4.5 tops the SWE-bench Verified leaderboard
On the SWE-bench Verified test — which evaluates real-world programming ability — Claude Sonnet 4.5 took the #1 spot in the industry.
Even crazier? It can stay focused and work continuously for over 30 hours straight.
Yes, AI just added another advantage over humans.
Unmatched Coding Power: Building Apps Like Breathing
For example, when tasked with building a Slack- or Teams-like chat app, it pumped out 11,000 lines of code in one sitting. By comparison, the older Claude Opus 4 and Codex could only manage about 7 hours of sustained work.
According to Anthropic, Claude Sonnet 4.5 is now the world’s most powerful coding model — with massive improvements in building complex agents, computer operations, reasoning, and math.
On OSWorld, a benchmark designed to test real computer tasks, it scored 61.4%, again taking first place. Just four months ago, Sonnet 4 led with 42.2% — so the performance jump is staggering.
New Features and Tool Upgrades of Claude Sonnet 4.5
Anthropic also rolled out several major upgrades alongside the new model:
- Checkpoint support is finally available, allowing progress to be saved and rolled back to previous states at any time.
- Usage information can now be queried directly within Claude Code using
/usage
.
- Native VS Code plugin is available, similar to OpenAI’s Codex plugin.
- Claude Code SDK has been officially renamed to Claude Agent SDK, enhancing Agent-building capabilities.
- Terminal interface has been significantly redesigned, allowing users to see previous session history and new feature lists at a glance upon startup.
Anthropic has even opened up the underlying infrastructure they use to build Claude Code, called the Claude Agent SDK.
Managing memory for agents during long-running tasks, designing permission systems that balance autonomy and user control, and coordinating multiple sub-agents to achieve goals are all challenging aspects of building and designing AI agents.
With the Claude Agent SDK, you can now leverage this infrastructure to build your own products.
Early User Reactions: A Love-Hate Relationship
Starting today, developers can call claude-sonnet-4-5
via the Claude API. Pricing is unchanged from Sonnet 4: $3/$15 per million tokens.
Early testers are already impressed:
One developer shared their experience immediately after trying it:
“Claude 4.5 Sonnet just refactored my entire codebase in a single run—25 tool invocations, over 3,000 new lines of code, and 12 brand-new files. It modularized everything, broke down the monolithic structure, and cleaned up spaghetti code. The result didn’t actually run, but wow, it was really elegant.” This review feels like a mix of love and frustration.
Cursor stated that they observed cutting-edge programming performance with Claude Sonnet 4.5, especially with improvements in handling long-duration tasks. This further explains why many Cursor users choose Claude to tackle the most complex problems.
Well-known reviewer tech blogger Dan Shipper noted that Claude 4.5 feels faster, more controllable, and more stable.
Enhanced Safety: Highest Alignment Yet
Performance is one thing, but safety must keep pace.
Claude Sonnet 4.5 is, according to Anthropic, their most aligned cutting-edge model to date.
Thanks to Claude’s enhanced capabilities combined with rigorous safety training, Anthropic has made significant improvements in model behavior, reducing tendencies such as sycophancy, deception, power-seeking, and encouraging delusions. Additionally, Anthropic has achieved major breakthroughs in defending against prompt injection attacks and minimizing content misclassification.
Claude Sonnet 4.5 Experimental Feature: Imagine with Claude
At the same time, Anthropic launched a temporary research preview called Imagine with Claude. In this mode, Claude generates software in real time — none of the functions or code are pre-written. Everything is created and adjusted interactively on the spot.
This preview is only available for Claude Max subscribers over the next 5 days.
Access it here:https://claude.ai/imagine/
Market Competition and Strategic Significance of Claude Sonnet 4.5
Anthropic is currently valued at $183 billion, with an annualized revenue run rate of $5 billion as of August — much of it fueled by coding tools. But the competition is fierce: OpenAI and Google Gemini are also racing to dominate the developer market.
Notably, OpenAI’s annual developer conference is just a week away. Anthropic dropping Claude 4.5 now is a clear move to apply pressure.
Anthropic’s co-founder and chief scientist Jared Kaplan has already teased that an even more advanced Claude Opus model will launch later this year.
Past Issues and Restoring User Confidence
It hasn’t all been smooth sailing. Over the past two months, users accused the Claude series of being “dumbed down.” Many reported sharp declines in reasoning, coding, formatting, and tool use quality — even paid Max subscribers.
Anthropic admitted to two independent bugs and rolled back the Opus 4.1 update, denying cost-cutting motives. But without compensation or refunds, backlash spread on GitHub and X, with some users switching to competitors like Codex.
The release of Claude Sonnet 4.5 is Anthropic’s chance to win them back. Whether it succeeds will depend on how the model performs in real-world use over the coming weeks.