Anthropic released Claude Sonnet 4 on May 22, 2025, alongside Claude Opus 4. This model replaces Claude Sonnet 3.7 with substantial improvements in coding performance, instruction following, and reasoning capabilities.
Claude Sonnet 4 achieved 72.7% on SWE-bench, placing it among the top-performing models for software engineering tasks. The model operates in two modes: standard responses and extended thinking for complex problems.
You get access to parallel tool execution, enhanced memory capabilities, and improved accuracy in following specific instructions. The model costs $3 per million input tokens and $15 per million output tokens.
Industry adoption has been rapid, with GitHub selecting Claude Sonnet 4 to power their new coding agent in GitHub Copilot.
Core Technical Improvements
Coding Performance Metrics
Claude Sonnet 4 scores 72.7% on SWE-bench Verified, testing real software engineering problems. This represents a significant jump from previous Sonnet models.
The model reduces navigation errors in complex codebases from 20% to near zero. Companies like iGent report successful autonomous multi-feature app development with minimal intervention.
Benchmark Results:
- SWE-bench Verified: 72.7%
- GPQA Diamond: 70.0% (with extended thinking)
- MMMLU: 85.4% (with extended thinking)
- MMMU: 72.6% (with extended thinking)
Enhanced Instruction Following
The model shows measurable improvements in executing complex, multi-step instructions. Manus highlights better adherence to specific requirements and clearer reasoning in outputs.
Shortcut behavior decreased by 65% compared to Claude Sonnet 3.7. The model now completes tasks properly rather than finding workarounds that miss the intent.
Dual Operating Modes
Standard Mode delivers immediate responses for straightforward tasks. Processing happens quickly with minimal computational overhead.
Extended Thinking Mode allows up to 64,000 tokens of reasoning space. The model works through complex problems step-by-step before providing final answers.
New Capabilities Analysis
Tool Use During Reasoning
Claude Sonnet 4 can access tools like web search during its thinking process. This beta feature lets the model gather information, analyze it, and incorporate findings into responses.
The model alternates between reasoning and tool use rather than handling them separately. You get more comprehensive answers without needing to provide all context upfront.
Parallel Tool Execution
Previous models used tools sequentially. Claude Sonnet 4 executes multiple tools simultaneously, reducing wait times for complex operations.
This improvement affects workflows that require data from multiple sources or simultaneous processing of different task components.
Memory File Management
When developers provide local file access, Claude Sonnet 4 creates and maintains memory files. These store key information across sessions for better context continuity.
The model demonstrated this capability by creating navigation guides while playing Pokemon, showing practical application of persistent memory.
Industry Implementation Data
GitHub Integration
GitHub chose Claude Sonnet 4 for their new GitHub Copilot coding agent. The selection followed testing that showed superior performance in agentic scenarios.
The model handles autonomous coding tasks with sustained accuracy across longer interactions.
Developer Platform Feedback
Sourcegraph reports the model stays on track longer during complex development tasks. Code quality improvements are measurable compared to previous versions.
Augment Code documented higher success rates and more precise code edits. They switched to Claude Sonnet 4 as their primary model based on performance data.
Cursor calls it state-of-the-art for coding with significant improvements in complex codebase understanding.
Access Methods and Pricing
Platform Availability
You can access Claude Sonnet 4 through:
- Web interface (free and paid plans)
- Mobile applications
- Desktop applications
- Anthropic API
- Amazon Bedrock
- Google Cloud Vertex AI
Cost Structure
Service Level | Input Cost | Output Cost |
---|---|---|
Claude Sonnet 4 | $3/million tokens | $15/million tokens |
Extended thinking | Included in plans | Usage-based billing |
Free users get standard Claude Sonnet 4 access. Pro, Max, Team, and Enterprise plans include extended thinking capabilities.
Claude Code Development Integration
IDE Integration
Claude Code now integrates directly with VS Code and JetBrains IDEs through beta extensions. Proposed code changes appear inline within your editor.
You install Claude Code through your IDE terminal and start collaborative coding immediately. No separate applications or complex setup required.
GitHub Automation
Claude Code handles GitHub pull request interactions automatically. Tag the system to:
- Address reviewer feedback
- Fix continuous integration errors
- Implement requested code modifications
The GitHub integration runs in beta with installation through Claude Code commands.
SDK and Custom Development
Anthropic released the Claude Code SDK for building custom agents and applications. You get the same core functionality that powers Claude Code in your own implementations.
Performance Benchmark Analysis
SWE-bench Methodology
Anthropic tested Claude Sonnet 4 on all 500 SWE-bench problems using two tools: bash execution and file editing via string replacement. No planning tools were included in the testing setup.
High-compute testing achieved 80.2% accuracy through parallel attempts and candidate selection using internal scoring models.
Comparison with Competition
Claude Sonnet 4 leads SWE-bench Verified rankings among publicly available models. OpenAI models are tested on a 477-problem subset rather than the full 500-problem set.
The model maintains competitive performance across multiple benchmarks without specialized fine-tuning for individual tasks.
Safety and Reliability Measures
ASL-3 Safety Standards
Claude Sonnet 4 implements AI Safety Level 3 measures with extensive testing protocols. Anthropic conducted evaluations to minimize risks while maximizing capabilities.
The model underwent evaluation across multiple risk categories before release.
Thinking Process Transparency
Extended thinking summaries condense lengthy reasoning chains for user review. About 95% of thinking processes display in full, with summarization only for extremely long reasoning chains.
Developers requiring complete reasoning chains can access Developer Mode through Anthropic sales for advanced prompt engineering needs.
Technical Architecture Notes
Hybrid Model Design
Claude Sonnet 4 combines fast response generation with deep reasoning capabilities in a single architecture. You switch between modes based on task complexity rather than using separate models.
This design maintains efficiency for simple tasks while providing advanced capabilities when needed.
Memory System Implementation
The memory capability requires developer-provided local file access. Claude Sonnet 4 cannot create persistent memory in standard chat environments.
When file access is available, the model actively maintains context files and retrieves relevant information across sessions.
Real-World Performance Data
Development Workflows
Companies report measurable productivity improvements in multi-file code changes and complex refactoring tasks. Replit documented improved precision in large codebase modifications.
Block validated code quality improvements during editing and debugging processes while maintaining full performance reliability.
Extended Task Completion
Rakuten tested Claude Sonnet 4 on a demanding open-source refactor that ran independently for 7 hours with sustained performance throughout the process.
This data point demonstrates the model’s capability for extended, autonomous work on complex projects.
Current Limitations and Considerations
Extended Thinking Availability
Extended thinking mode requires paid plan access. Free users get standard Claude Sonnet 4 without deep reasoning capabilities.
Processing times increase with extended thinking mode, particularly for complex problems requiring substantial reasoning.
Memory System Requirements
Persistent memory features only work with local file access provided by developers. Standard chat interfaces cannot maintain information between sessions.
You need specific implementation setup to access advanced memory capabilities.
Tool Access Dependencies
Extended thinking with tool use remains in beta with potential reliability variations. Production implementations should account for occasional tool access failures.
Claude Sonnet 4 represents a significant technical advancement in AI capabilities with measurable improvements across coding, reasoning, and instruction following. The model’s dual-mode architecture and enhanced tool integration provide practical benefits for both individual users and enterprise implementations.