Model Profile
anthropic/claude-sonnet-4
Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.
Identity
ID: external/anthropic/claude-sonnet-4
Author: anthropic
Origin: external_benchmark_shadow
Arch: unknown
Benchmark Coverage
Scored use cases: 12
Avg confidence: 44.9%
Evidence points: 294
Raw rows: 526
Weighted rows: 63
Catalog Metadata
Parameters: unknown
Context window: 4096
Downloads: 0
Price / 1M tokens: $6.00 (blended 3:1)
Intelligence Profile
Dimension Breakdown
* Low confidence — limited benchmark evidence for this dimension
5/5 dimensions scored · Last updated Apr 14, 2026
Benchmark Signals
Click through to the benchmark source behind this model profile.
LanguageBench Translation Official (Split)
translation_to:bleu
Normalized value 81.1% · confidence 100.0%
Strongest impact in Archaic and historical translation
languagebench_translation_official.translation_to_bleu · Apr 1, 2026
Galileo Agent Leaderboard v2
Avg AC
Normalized value 84.8% · confidence 100.0%
Strongest impact in Terraform generation
galileo_agent_v2.avg_ac · Apr 1, 2026
LanguageBench
overall:mean
Normalized value 96.5% · confidence 100.0%
Strongest impact in Archaic and historical translation
languagebench.overall_mean · Apr 1, 2026
Galileo Agent Leaderboard v2
Avg TSQ
Normalized value 94.9% · confidence 100.0%
Strongest impact in Social post generation
galileo_agent_v2.avg_tsq · Apr 1, 2026
SWE-bench Verified Leaderboard
swe_verified_resolved_pct
Normalized value 81.7% · confidence 100.0%
Strongest impact in Verilog/VHDL generation
swebench_verified_official.swe_verified_resolved_pct · Apr 1, 2026
EQ-Bench Leaderboard
eq_bench_score
Normalized value 90.4% · confidence 100.0%
Strongest impact in Social post generation
eq_bench.eq_bench_score · Apr 1, 2026
Coverage Diagnostics
actively scoredUse-Case Scores
151
Total Measurements
526
Weighted Measurements
63
Weighted Sources
28
Raw Source Coverage
Weighted Source Coverage
Best Use Cases for This Model
| Use Case | Score |
|---|---|
| Terraform generation use_case.sre.iac_terraform | 36.7% |
| Kubernetes manifest generation use_case.sre.iac_k8s | 36.7% |
| Config debugging use_case.sre.config_debugging | 36.7% |
| Campaign brief use_case.mkt.campaign_brief | 34.3% |
| Social post generation use_case.mkt.social_post_generation | 34.3% |
| Product positioning and messaging use_case.mkt.product_positioning | 34.3% |
| Archaic and historical translation use_case.history.archaic_translation | 34.1% |
| Legal translation use_case.legal.legal_translation | 33.3% |
| Verilog/VHDL generation use_case.eda.verilog_generation | 32.7% |
| Personalized sales outreach use_case.mkt.sales_outreach_personalized | 32.4% |
| Ad copy variants use_case.mkt.ad_copy_variants | 32.4% |
| Brand voice localization use_case.mkt.brand_voice_localization | 32.1% |