Model Profile
claude-opus-4-6
Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.
Identity
ID: external/anthropic/claude-opus-4-6
Author: anthropic
Origin: external_benchmark_shadow
Arch: unknown
Benchmark Coverage
Scored use cases: 12
Avg confidence: 24.5%
Evidence points: 199
Raw rows: 196
Weighted rows: 36
Catalog Metadata
Parameters: unknown
Context window: 4096
Downloads: 0
Price / 1M tokens: $10.00 (blended 3:1)
Intelligence Profile
Dimension Breakdown
No eq benchmarks found
* Low confidence — limited benchmark evidence for this dimension
4/5 dimensions scored · Last updated Apr 14, 2026
Benchmark Signals
Click through to the benchmark source behind this model profile.
SWE-bench Verified Leaderboard
swe_verified_resolved_pct
Normalized value 95.4% · confidence 100.0%
Strongest impact in CAD scripting helper
swebench_verified_official.swe_verified_resolved_pct · Apr 1, 2026
OpenHands Index
average_score_pct
Normalized value 100.0% · confidence 100.0%
Strongest impact in Autonomous Coding Agent
openhands_index.average_score_pct · Apr 1, 2026
UGI Leaderboard
Writing ✍️
Normalized value 100.0% · confidence 100.0%
Strongest impact in Poetry and lyrics
ugi_main.writing · Apr 1, 2026
OpenHands Issue Resolution
issue_resolution_score_pct
Normalized value 76.9% · confidence 100.0%
Strongest impact in Agentic bug fixing
openhands_issue_resolution.issue_resolution_score_pct · Apr 1, 2026
UGI Leaderboard
Entertainment
Normalized value 90.7% · confidence 100.0%
Strongest impact in Poetry and lyrics
ugi_main.entertainment · Apr 1, 2026
OpenHands Index
issue_resolution_score_pct
Normalized value 76.9% · confidence 100.0%
Strongest impact in CAD scripting helper
openhands_index.issue_resolution_score_pct · Apr 1, 2026
Some fit rows have limited benchmark evidence.
6 of 12 scored use cases have low confidence or thin contributor coverage.
Coverage Diagnostics
actively scoredUse-Case Scores
120
Total Measurements
196
Weighted Measurements
36
Weighted Sources
16
Raw Source Coverage
Weighted Source Coverage
Best Use Cases for This Model
| Use Case | Score |
|---|---|
| Autonomous Coding Agent use_case.dev.autonomous_coding_agent | 28.4% |
| IDE code completion use_case.dev.ide_completion | 27.6% |
| CAD scripting helper use_case.eng.cad_scripting_helper | 27.5% |
| Code generation use_case.dev.code_generation | 26.6% |
| Agentic bug fixing use_case.dev.agentic_bug_fixing | 24.5% |
| PR review agent use_case.dev.pr_review_agent | 23.9% |
| Function Calling / Tool Use Agent use_case.dev.function_calling_agent | 21.8% |
| Quant research code generation use_case.fin.alpha_research_codegen | 17.7% |
| Poetry and lyrics use_case.creative.poetry_lyrics | 15.4% |
| Screenplay scene writing use_case.creative.screenplay_scene | 15.4% |
| Agentic incident response use_case.sre.agentic_incident_response | 15.3% |
| Prompt injection resistance (eval) use_case.security.prompt_injection_resistance_eval | 14.9% |