Anthropic releases Claude Opus 4.5 on November 24, 2025, achieving 80.9% on SWE-bench Verified and 66.3% on OSWorld. The model outperforms human engineers in internal testing and introduces revolutionary pricing.

Claude Opus 4.5: The Model That Outperforms Human Engineers

Anthropic has released Claude Opus 4.5, their most capable AI model to date, on November 24, 2025. The model has sent shockwaves through the AI industry by achieving state-of-the-art results on software engineering benchmarks and, according to Anthropic's internal testing, outperforming human software engineers on complex coding tasks.

Record-Breaking Benchmarks

Claude Opus 4.5 sets new records across multiple challenging benchmarks:

Benchmark	Claude Opus 4.5	GPT-5.2 Pro	Gemini 3 Pro
SWE-bench Verified	80.9%	78.4%	72.1%
OSWorld	66.3%	58.7%	54.2%
HumanEval	96.8%	94.3%	93.1%
MATH	89.4%	87.2%	85.8%

These results establish Claude Opus 4.5 as the leading model for software engineering and agentic computer use tasks.

What Makes Opus 4.5 Special

1. Extended Computer Use

Opus 4.5 can operate a computer autonomously for extended periods:

# Example: Claude Opus 4.5 Computer Use
from anthropic import Anthropic

client = Anthropic()

response = client.beta.computer_use.create(
    model="claude-opus-4.5",
    max_tokens=4096,
    tools=[
        {"type": "computer_20241022", "display_width": 1920, "display_height": 1080},
        {"type": "bash_20241022"},
        {"type": "text_editor_20241022"}
    ],
    messages=[{
        "role": "user",
        "content": "Set up a new React project with TypeScript, add authentication, and deploy to Vercel"
    }]
)

The model can click, type, navigate browsers, and execute terminal commands to complete complex multi-step tasks.

2. Outperforming Human Engineers

In Anthropic's internal SWE-Bench Pro evaluation:

Opus 4.5 resolved 83.2% of real-world GitHub issues
Senior human engineers (5+ years experience) resolved 79.1%
The model completed tasks 3.7x faster on average

This marks the first time an AI model has definitively outperformed experienced engineers on a standardized software engineering benchmark.

3. Revolutionary Pricing

Unlike previous flagship models, Anthropic has dramatically reduced pricing:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Opus 4.5	$12.00	$48.00
Claude Opus 4.0	$25.00	$100.00
GPT-5.2 Pro	$15.00	$60.00

A 52% reduction from the previous Opus generation makes the most capable model accessible to more developers.

Anthropic's Safety Innovations

Bloom: Open-Source AI Evaluation

Alongside Opus 4.5, Anthropic released Bloom, an open-source framework for behavioral evaluation:

Automatically generates thousands of test scenarios
Evaluates model responses for safety violations
Tracks behavioral changes across model versions
Available on GitHub under MIT license

Agent Skills Open Standard

Anthropic has also opened their Agent Skills specification, allowing:

Portable skills across different AI platforms
Standardized capability definitions
Interoperability between Claude and other models

This marks a shift towards open ecosystem building rather than closed proprietary systems.

Enterprise Features

Claude Opus 4.5 introduces enterprise-grade capabilities:

Extended Context: 200K token context window
Reliability: 99.9% uptime SLA for enterprise customers
Compliance: SOC 2 Type II, HIPAA, GDPR certified
Fine-tuning: Coming Q1 2026 for enterprise customers
On-premise: Anthropic Bedrock deployment option via AWS

Real-World Applications

Early enterprise adopters report significant productivity gains:

Software Development

Stripe: 40% faster code review cycles
Notion: Automated bug triage and initial fixes
Vercel: AI-powered deployment debugging

Research & Analysis

Bloomberg: Financial report synthesis
McKinsey: Market analysis automation
Nature: Scientific paper review assistance

Customer Operations

Intercom: Autonomous customer support resolution
Zendesk: 70% reduction in ticket escalations

The Competitive Landscape

With Opus 4.5's release, the AI model race has intensified:

December 2025 Leaderboard (Combined Score):
1. Claude Opus 4.5     ████████████████████ 94.2
2. GPT-5.2 Pro         ███████████████████  92.8
3. Gemini 3 Pro        ██████████████████   90.1
4. Grok 4.1            █████████████████    88.4
5. Mistral 3 Large     ████████████████     86.7

What's Next for Claude

Anthropic has announced upcoming features:

Claude Skills Marketplace: Share and discover pre-built agent skills
Opus 4.5 Vision: Enhanced multimodal understanding (January 2026)
Claude Teams: Collaborative AI workflows for organizations
Opus Fine-tuning: Custom model training for enterprises

Verdict

Claude Opus 4.5 represents a inflection point in AI capability. Its combination of benchmark-leading performance, practical utility, and aggressive pricing makes it the go-to choice for serious development work.

For the first time, we have an AI that can genuinely augment—and in some cases exceed—the capabilities of experienced software engineers. The implications for software development, scientific research, and enterprise productivity are profound.

Claude Opus 4.5 is available now via the Anthropic API and Amazon Bedrock.

Web Stories

Claude Opus 4.5: Anthropic's Most Powerful Model Sets New Coding and Agent Benchmarks