Model Profile
anthropic/claude-opus-4
Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.
Identity
ID: external/anthropic/claude-opus-4
Author: anthropic
Origin: external_benchmark_shadow
Arch: unknown
Benchmark Coverage
Scored use cases: 12
Avg confidence: 21.4%
Evidence points: 135
Raw rows: 361
Weighted rows: 26
Catalog Metadata
Parameters: unknown
Context window: 4096
Downloads: 0
Price / 1M tokens: $10.00 (blended 3:1)
Intelligence Profile
Dimension Breakdown
* Low confidence — limited benchmark evidence for this dimension
5/5 dimensions scored · Last updated Apr 14, 2026
Benchmark Signals
Click through to the benchmark source behind this model profile.
SWE-bench Verified Leaderboard
swe_verified_resolved_pct
Normalized value 85.1% · confidence 100.0%
Strongest impact in Verilog/VHDL generation
swebench_verified_official.swe_verified_resolved_pct · Apr 1, 2026
EQ-Bench Leaderboard
eq_bench_score
Normalized value 91.2% · confidence 100.0%
Strongest impact in Social post generation
eq_bench.eq_bench_score · Apr 1, 2026
Aider Polyglot Leaderboard
percent_correct_pct
Normalized value 80.7% · confidence 100.0%
Strongest impact in Verilog/VHDL generation
aider_polyglot.percent_correct_pct · Apr 1, 2026
UGI Leaderboard
Writing ✍️
Normalized value 94.1% · confidence 100.0%
Strongest impact in Social post generation
ugi_main.writing · Apr 1, 2026
Vals LiveCodeBench
overall_accuracy_pct
Normalized value 64.4% · confidence 100.0%
Strongest impact in Simulation setup assistant
vals_lcb.overall_accuracy_pct · Mar 31, 2026
Aider Polyglot Leaderboard
correct_edit_format_pct
Normalized value 96.0% · confidence 100.0%
Strongest impact in Integration test generation
aider_polyglot.correct_edit_format_pct · Apr 1, 2026
Some fit rows have limited benchmark evidence.
12 of 12 scored use cases have low confidence or thin contributor coverage.
Coverage Diagnostics
actively scoredUse-Case Scores
103
Total Measurements
361
Weighted Measurements
26
Weighted Sources
12
Raw Source Coverage
Weighted Source Coverage
Best Use Cases for This Model
| Use Case | Score |
|---|---|
| Verilog/VHDL generation use_case.eda.verilog_generation | 20.3% |
| Simulation setup assistant use_case.eng.simulation_setup_assistant | 18.2% |
| Social post generation use_case.mkt.social_post_generation | 18.0% |
| Product positioning and messaging use_case.mkt.product_positioning | 18.0% |
| Campaign brief use_case.mkt.campaign_brief | 18.0% |
| Integration test generation use_case.dev.integration_tests | 17.5% |
| Ad copy variants use_case.mkt.ad_copy_variants | 17.0% |
| Personalized sales outreach use_case.mkt.sales_outreach_personalized | 17.0% |
| Refactoring assistant use_case.dev.refactoring | 16.5% |
| Terraform generation use_case.sre.iac_terraform | 16.3% |
| Kubernetes manifest generation use_case.sre.iac_k8s | 16.3% |
| Config debugging use_case.sre.config_debugging | 16.3% |