Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents.
Claude Opus 4 is the world’s best coding model, with sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.
- Extended thinking with tool use (beta): Both models can use tools — like web search — during extended thinking, allowing Claude to alternate between reasoning and tool use to improve responses.
- New model capabilities: Both models can use tools in parallel, follow instructions more precisely, and — when given access to local files by developers — demonstrate significantly improved memory capabilities, extracting and saving key facts to maintain continuity and build tacit knowledge over time.
- Claude Code is now generally available: After receiving extensive positive feedback during our research preview, we’re expanding how developers can collaborate with Claude. Claude Code now supports background tasks via GitHub Actions and native integrations with VS Code and JetBrains, displaying edits directly in your files for seamless pair programming.
- New API capabilities: We’re releasing four new capabilities on the Anthropic API that enable developers to build more powerful AI agents: the code execution tool, MCP connector, Files API, and the ability to cache prompts for up to one hour.
Claude Opus 4 and Sonnet 4 are hybrid models offering two modes: near-instant responses and extended thinking for deeper reasoning. The Pro, Max, Team, and Enterprise Claude plans include both models and extended thinking, with Sonnet 4 also available to free users. Both models are available on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing remains consistent with previous Opus and Sonnet models: Opus 4 at $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15.
Claude 4
Aside from the new model announcement, Anthropic also shared a couple of new capabilities:
- Extended thinking with tool use (beta): Claude can now switch between thinking and using tools like web search to give better answers.
- New model capabilities: Claude can now use tools at the same time, follow instructions more accurately, and remember key facts from local files to improve over time.
- Claude Code: This is open to all developers, with support for background tasks through GitHub Actions and built-in tools for VS Code and JetBrains to help with coding directly in your files.
- New API capabilities: The Anthropic API now includes four new capabilities: code execution, MCP connector, Files API, and prompt caching for up to one hour.
The new model is now accessible in the Claude chatbot app, both in the desktop app and the browser.

Claude 4 in browser chat application
Extended Thinking mode can be enabled from the settings menu.

Extended thinking in Claude 4 chat application
Also worth noting: In addition to extended thinking with tool use, parallel tool execution, and memory improvements, Anthropic significantly reduced behavior where the models use shortcuts or loopholes to complete tasks. Both models are 65% less likely to engage in this behavior than Sonnet 3.7 on agentic tasks that are particularly susceptible to shortcuts and loopholes.
Claude 4 Performance
Claude Opus 4 is Anthropic’s most powerful model to date and one of the best coding models in the world. It leads on SWE-bench with a score of 72.5 percent and on Terminal-bench with 43.2 percent.
It can handle complex, long-running tasks for several hours without losing focus. It also performs far better than all Sonnet models, showing how much more AI agents can now achieve.
These models support a wide range of AI use cases. Opus 4 pushes progress in coding, research, writing, and scientific discovery. Sonnet 4, on the other hand, offers strong performance for everyday tasks and serves as a clear upgrade from Sonnet 3.7.
Claude 4 software engineering benchmark
Claude 4 models also lead on SWE-bench Verified, a benchmark that tests how well models perform on real software engineering tasks. Both models deliver strong performance across coding, reasoning, multimodal capabilities, and agentic tasks.
Claude 4 performance benchmark
You can learn about the performance benchmark data sources below:
- Open AI: o3 launch post, o3 system card, GPT-4.1 launch post, GPT-4.1 hosted evals
- Gemini: Gemini 2.5 Pro Preview model card
- Claude: Claude 3.7 Sonnet launch post
Claude 4 in Cursor IDE
As a dev, this is the part I actually care about. Claude 4 is available in Cursor now.
If you’ve been following my solopreneur journey, you know I’ve been building web apps with AI for a while. Cursor is where I do most of my work, so having Claude 4 inside it means I can test things right away.
Starting today, you can access claude-4-sonnet
and claude-4-opus
in the model list. Just make sure your Cursor app is the latest version.

Claude 4 in Cursor IDE
Also, both have a 120K context window. That’s way up from the 75K in Claude 3.5 Sonnet. More tokens = more freedom to throw in big files or bigger project sizes without losing context.
Claude 4 Pricing
The Claude Sonnet 4 model, which is faster and doesn’t have quite the same capacity in terms of thinking, coding, and memory, is available now to users on the free plan.
If you want to use the more premium Claude Opus 4, which also includes extra tools and integrations, it is available at $20 + tax per month or $200 + tax per year.

Claude Opus 4 pricing
If you are trying to access the model via API, the pricing for Claude 4 starts at $15 per million input tokens and $75 per million output tokens. However, Anthropic says users can cut costs by up to 90% with prompt caching and by 50% with batch processing.
Final Thoughts
Claude 4 is powerful. No question about that. But I still have mixed feelings.
While competitors like the ones from Google offer a million tokens in the context window, the 200k from Claude is a bit disappointing. While I haven’t experienced it yet from a few minutes of testing, many users have reported that they easily reach the context limit with a few prompts.
Comments
Post a Comment