Model Profile

claude-sonnet-4-5-20250929

Name: claude-sonnet-4-5-20250929
Rating: 1.4 (154 reviews)
Author: anthropic

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/anthropic/claude-sonnet-4-5-20250929

Author: anthropic

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 37.3%

Evidence points: 154

Raw rows: 282

Weighted rows: 31

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ6 benchmarks

59.4%*

EQ9 benchmarks

10.0%

Accuracy6 benchmarks

48.0%*

Creativity2 benchmarks

76.8%*

Based1 benchmark

59.0%*

* Low confidence — limited benchmark evidence for this dimension

5/5 dimensions scored · Last updated Apr 30, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Vals CorpFin v2

overall_accuracy_pct

2.8%

Normalized value 80.9% · confidence 100.0%

Strongest impact in Thesis red teaming

vals_corp_fin_v2.overall_accuracy_pct · Apr 30, 2026

UGI Leaderboard

Writing ✍️

2.8%

Normalized value 90.0% · confidence 100.0%

Strongest impact in Adult ERP roleplay (explicit)

ugi_main.writing · Apr 30, 2026

BFCL Relevance Detection Official

Irrelevance Detection

2.4%

Normalized value 94.7% · confidence 100.0%

Strongest impact in Casual chat companion

bfcl_relevance_detection_official.irrelevance_detection · Apr 30, 2026

BFCL Relevance Detection Official

Relevance Detection

2.4%

Normalized value 37.5% · confidence 100.0%

Strongest impact in Casual chat companion

bfcl_relevance_detection_official.relevance_detection · Apr 30, 2026

UGI Leaderboard

Entertainment

1.7%

Normalized value 62.7% · confidence 100.0%

Strongest impact in Adult ERP roleplay (explicit)

ugi_main.entertainment · Apr 30, 2026

Vals CorpFin v2

shared_max_context_accuracy_pct

1.6%

Normalized value 78.6% · confidence 100.0%

Strongest impact in Thesis red teaming

vals_corp_fin_v2.shared_max_context_accuracy_pct · Apr 30, 2026

Some fit rows have limited benchmark evidence.

2 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

101

Total Measurements

282

Weighted Measurements

Weighted Sources

Raw Source Coverage

ugi_main 57bfcl_adjacent_public 30bfcl_overall 30vectara_hhem_leaderboard 21vals_sage 20corpfin_taxeval_public 16

Weighted Source Coverage

vectara_hhem_leaderboard 12bfcl_overall 3ugi_main 3vals_corp_fin_v2 3bfcl_relevance_detection_official 2swe_bench_leaderboard 2

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Casual chat companion use_case.companion.casual_chat	companion	14.4%	44.2%	13	BFCL Relevance Detection Official: Irrelevance Detection
Life coaching and goal planning use_case.companion.life_coaching	companion	14.4%	44.2%	13	BFCL Relevance Detection Official: Irrelevance Detection
Tarot-style reading use_case.spiritual.tarot_reading	companion	14.4%	44.2%	13	BFCL Relevance Detection Official: Irrelevance Detection
Empathetic support chat use_case.companion.empathy_support_chat	companion	14.1%	43.2%	13	BFCL Relevance Detection Official: Irrelevance Detection
Mindfulness and meditation scripts use_case.wellness.mindfulness_scripts	companion	14.0%	43.1%	13	BFCL Relevance Detection Official: Irrelevance Detection
Adult ERP roleplay (explicit) use_case.adult.erp_roleplay	adult	13.8%	41.8%	13	UGI Leaderboard: Writing ✍️
SFW roleplay and simulation use_case.creative.sfw_roleplay_simulation	creative	12.4%	38.6%	13	UGI Leaderboard: Writing ✍️
Thesis red teaming use_case.fin.thesis_red_team	finance	12.3%	17.4%	11	Vals CorpFin v2: overall_accuracy_pct
Adult erotica (long-form, explicit) use_case.adult.erotica_longform	adult	12.1%	38.4%	13	UGI Leaderboard: Writing ✍️
Interactive fiction / DM use_case.creative.interactive_fiction_dm	creative	12.0%	37.4%	13	UGI Leaderboard: Writing ✍️
NPC dialogue use_case.gaming.npc_dialogue	creative	12.0%	37.4%	13	UGI Leaderboard: Writing ✍️
Earnings call synthesis use_case.fin.earnings_call_synthesis	finance	11.8%	18.0%	13	Vals CorpFin v2: overall_accuracy_pct