Claude Sonnet 4

Claude Sonnet 4

Anthropic released Claude Sonnet 4 on May 22, 2025, alongside Claude Opus 4. This model replaces Claude Sonnet 3.7 with

Claude Sonnet 4 overview

Batch Responses: yes
Is Open Sourced: no
Model release date: May 22, 2025

What is Claude Sonnet 4?

Claude Sonnet 4 Modalities

Text Input and Output
Image Input Only
Audio Input Only

Claude Sonnet 4 Features

Streaming

Anthropic released Claude Sonnet 4 on May 22, 2025, alongside Claude Opus 4. This model replaces Claude Sonnet 3.7 with substantial improvements in coding performance, instruction following, and reasoning capabilities.

Claude Sonnet 4 achieved 72.7% on SWE-bench, placing it among the top-performing models for software engineering tasks. The model operates in two modes: standard responses and extended thinking for complex problems.

You get access to parallel tool execution, enhanced memory capabilities, and improved accuracy in following specific instructions. The model costs $3 per million input tokens and $15 per million output tokens.

Industry adoption has been rapid, with GitHub selecting Claude Sonnet 4 to power their new coding agent in GitHub Copilot.

Core Technical Improvements

Coding Performance Metrics

Claude Sonnet 4 scores 72.7% on SWE-bench Verified, testing real software engineering problems. This represents a significant jump from previous Sonnet models.

The model reduces navigation errors in complex codebases from 20% to near zero. Companies like iGent report successful autonomous multi-feature app development with minimal intervention.

Benchmark Results:

  • SWE-bench Verified: 72.7%
  • GPQA Diamond: 70.0% (with extended thinking)
  • MMMLU: 85.4% (with extended thinking)
  • MMMU: 72.6% (with extended thinking)

Enhanced Instruction Following

The model shows measurable improvements in executing complex, multi-step instructions. Manus highlights better adherence to specific requirements and clearer reasoning in outputs.

Shortcut behavior decreased by 65% compared to Claude Sonnet 3.7. The model now completes tasks properly rather than finding workarounds that miss the intent.

Dual Operating Modes

Standard Mode delivers immediate responses for straightforward tasks. Processing happens quickly with minimal computational overhead.

Extended Thinking Mode allows up to 64,000 tokens of reasoning space. The model works through complex problems step-by-step before providing final answers.

New Capabilities Analysis

Tool Use During Reasoning

Claude Sonnet 4 can access tools like web search during its thinking process. This beta feature lets the model gather information, analyze it, and incorporate findings into responses.

The model alternates between reasoning and tool use rather than handling them separately. You get more comprehensive answers without needing to provide all context upfront.

Parallel Tool Execution

Previous models used tools sequentially. Claude Sonnet 4 executes multiple tools simultaneously, reducing wait times for complex operations.

This improvement affects workflows that require data from multiple sources or simultaneous processing of different task components.

Memory File Management

When developers provide local file access, Claude Sonnet 4 creates and maintains memory files. These store key information across sessions for better context continuity.

The model demonstrated this capability by creating navigation guides while playing Pokemon, showing practical application of persistent memory.

Industry Implementation Data

GitHub Integration

GitHub chose Claude Sonnet 4 for their new GitHub Copilot coding agent. The selection followed testing that showed superior performance in agentic scenarios.

The model handles autonomous coding tasks with sustained accuracy across longer interactions.

Developer Platform Feedback

Sourcegraph reports the model stays on track longer during complex development tasks. Code quality improvements are measurable compared to previous versions.

Augment Code documented higher success rates and more precise code edits. They switched to Claude Sonnet 4 as their primary model based on performance data.

Cursor calls it state-of-the-art for coding with significant improvements in complex codebase understanding.

Access Methods and Pricing

Platform Availability

You can access Claude Sonnet 4 through:

  • Web interface (free and paid plans)
  • Mobile applications
  • Desktop applications
  • Anthropic API
  • Amazon Bedrock
  • Google Cloud Vertex AI

Cost Structure

Service LevelInput CostOutput Cost
Claude Sonnet 4$3/million tokens$15/million tokens
Extended thinkingIncluded in plansUsage-based billing

Free users get standard Claude Sonnet 4 access. Pro, Max, Team, and Enterprise plans include extended thinking capabilities.

Claude Code Development Integration

IDE Integration

Claude Code now integrates directly with VS Code and JetBrains IDEs through beta extensions. Proposed code changes appear inline within your editor.

You install Claude Code through your IDE terminal and start collaborative coding immediately. No separate applications or complex setup required.

GitHub Automation

Claude Code handles GitHub pull request interactions automatically. Tag the system to:

  • Address reviewer feedback
  • Fix continuous integration errors
  • Implement requested code modifications

The GitHub integration runs in beta with installation through Claude Code commands.

SDK and Custom Development

Anthropic released the Claude Code SDK for building custom agents and applications. You get the same core functionality that powers Claude Code in your own implementations.

Performance Benchmark Analysis

SWE-bench Methodology

Anthropic tested Claude Sonnet 4 on all 500 SWE-bench problems using two tools: bash execution and file editing via string replacement. No planning tools were included in the testing setup.

High-compute testing achieved 80.2% accuracy through parallel attempts and candidate selection using internal scoring models.

Comparison with Competition

Claude Sonnet 4 leads SWE-bench Verified rankings among publicly available models. OpenAI models are tested on a 477-problem subset rather than the full 500-problem set.

The model maintains competitive performance across multiple benchmarks without specialized fine-tuning for individual tasks.

Safety and Reliability Measures

ASL-3 Safety Standards

Claude Sonnet 4 implements AI Safety Level 3 measures with extensive testing protocols. Anthropic conducted evaluations to minimize risks while maximizing capabilities.

The model underwent evaluation across multiple risk categories before release.

Thinking Process Transparency

Extended thinking summaries condense lengthy reasoning chains for user review. About 95% of thinking processes display in full, with summarization only for extremely long reasoning chains.

Developers requiring complete reasoning chains can access Developer Mode through Anthropic sales for advanced prompt engineering needs.

Technical Architecture Notes

Hybrid Model Design

Claude Sonnet 4 combines fast response generation with deep reasoning capabilities in a single architecture. You switch between modes based on task complexity rather than using separate models.

This design maintains efficiency for simple tasks while providing advanced capabilities when needed.

Memory System Implementation

The memory capability requires developer-provided local file access. Claude Sonnet 4 cannot create persistent memory in standard chat environments.

When file access is available, the model actively maintains context files and retrieves relevant information across sessions.

Real-World Performance Data

Development Workflows

Companies report measurable productivity improvements in multi-file code changes and complex refactoring tasks. Replit documented improved precision in large codebase modifications.

Block validated code quality improvements during editing and debugging processes while maintaining full performance reliability.

Extended Task Completion

Rakuten tested Claude Sonnet 4 on a demanding open-source refactor that ran independently for 7 hours with sustained performance throughout the process.

This data point demonstrates the model’s capability for extended, autonomous work on complex projects.

Current Limitations and Considerations

Extended Thinking Availability

Extended thinking mode requires paid plan access. Free users get standard Claude Sonnet 4 without deep reasoning capabilities.

Processing times increase with extended thinking mode, particularly for complex problems requiring substantial reasoning.

Memory System Requirements

Persistent memory features only work with local file access provided by developers. Standard chat interfaces cannot maintain information between sessions.

You need specific implementation setup to access advanced memory capabilities.

Tool Access Dependencies

Extended thinking with tool use remains in beta with potential reliability variations. Production implementations should account for occasional tool access failures.

Claude Sonnet 4 represents a significant technical advancement in AI capabilities with measurable improvements across coding, reasoning, and instruction following. The model’s dual-mode architecture and enhanced tool integration provide practical benefits for both individual users and enterprise implementations.

Other popular AI Models (LLMs)