Model Profile

anthropic/claude-sonnet-4

Name: anthropic/claude-sonnet-4
Rating: 3.7 (294 reviews)
Author: anthropic

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/anthropic/claude-sonnet-4

Author: anthropic

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 44.9%

Evidence points: 294

Raw rows: 526

Weighted rows: 63

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Price / 1M tokens: $6.00 (blended 3:1)

Intelligence Profile

Dimension Breakdown

IQ18 benchmarks

57.6%

EQ1 benchmark

90.4%*

Accuracy3 benchmarks

68.2%*

Creativity2 benchmarks

72.8%*

Based1 benchmark

6.0%*

* Low confidence — limited benchmark evidence for this dimension

5/5 dimensions scored · Last updated Apr 14, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

LanguageBench Translation Official (Split)

translation_to:bleu

6.1%

Normalized value 81.1% · confidence 100.0%

Strongest impact in Archaic and historical translation

languagebench_translation_official.translation_to_bleu · Apr 1, 2026

Galileo Agent Leaderboard v2

Avg AC

5.5%

Normalized value 84.8% · confidence 100.0%

Strongest impact in Terraform generation

galileo_agent_v2.avg_ac · Apr 1, 2026

LanguageBench

overall:mean

5.0%

Normalized value 96.5% · confidence 100.0%

Strongest impact in Archaic and historical translation

languagebench.overall_mean · Apr 1, 2026

Galileo Agent Leaderboard v2

Avg TSQ

4.7%

Normalized value 94.9% · confidence 100.0%

Strongest impact in Social post generation

galileo_agent_v2.avg_tsq · Apr 1, 2026

SWE-bench Verified Leaderboard

swe_verified_resolved_pct

4.4%

Normalized value 81.7% · confidence 100.0%

Strongest impact in Verilog/VHDL generation

swebench_verified_official.swe_verified_resolved_pct · Apr 1, 2026

EQ-Bench Leaderboard

eq_bench_score

4.0%

Normalized value 90.4% · confidence 100.0%

Strongest impact in Social post generation

eq_bench.eq_bench_score · Apr 1, 2026

Coverage Diagnostics

actively scored

Use-Case Scores

151

Total Measurements

526

Weighted Measurements

Weighted Sources

Raw Source Coverage

vals_mmlu_pro 60ugi_main 57vals_mgsm 48galileo_agent_v2 34corpfin_taxeval_public 28vals_medqa 28

Weighted Source Coverage

vectara_hhem_leaderboard 12galileo_agent_v2 10sonar_java_quality 4facts_benchmark_suite 3languagebench 3languagebench_translation_official 3

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Terraform generation use_case.sre.iac_terraform	devops_sre	36.7%	49.8%	23	Galileo Agent Leaderboard v2: Avg AC
Kubernetes manifest generation use_case.sre.iac_k8s	devops_sre	36.7%	49.8%	23	Galileo Agent Leaderboard v2: Avg AC
Config debugging use_case.sre.config_debugging	devops_sre	36.7%	49.8%	23	Galileo Agent Leaderboard v2: Avg AC
Campaign brief use_case.mkt.campaign_brief	marketing_sales	34.3%	44.7%	23	Galileo Agent Leaderboard v2: Avg TSQ
Social post generation use_case.mkt.social_post_generation	marketing_sales	34.3%	44.7%	23	Galileo Agent Leaderboard v2: Avg TSQ
Product positioning and messaging use_case.mkt.product_positioning	marketing_sales	34.3%	44.7%	23	Galileo Agent Leaderboard v2: Avg TSQ
Archaic and historical translation use_case.history.archaic_translation	history_linguistics	34.1%	45.1%	29	LanguageBench Translation Official (Split): translation_to:bleu
Legal translation use_case.legal.legal_translation	legal	33.3%	40.6%	28	LanguageBench Translation Official (Split): translation_to:bleu
Verilog/VHDL generation use_case.eda.verilog_generation	engineering	32.7%	45.5%	27	SWE-bench Verified Leaderboard: swe_verified_resolved_pct
Personalized sales outreach use_case.mkt.sales_outreach_personalized	marketing_sales	32.4%	42.2%	23	Galileo Agent Leaderboard v2: Avg TSQ
Ad copy variants use_case.mkt.ad_copy_variants	marketing_sales	32.4%	42.2%	23	Galileo Agent Leaderboard v2: Avg TSQ
Brand voice localization use_case.mkt.brand_voice_localization	marketing_sales	32.1%	39.8%	26	LanguageBench Translation Official (Split): translation_to:bleu