BasedAGIBasedAGI
Menu
Rankings live

marketing_sales

Ad copy variants

Generate diverse headline/CTA variants under strict constraints.

#1 Recommendation

gpt-4o

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (84%) and CRMArena Function Calling overall_score_pct (82%)

external/openai/gpt-4o

28.8%

Score

35.8%

Confidence

Limited benchmark evidence for this use case.

62 ranked models with average evidence of 11.5 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

81%

Scoring

Benchmark-backed

Top Signal

Creative Writing Official (EQ-Bench Slice): creative_writing_score

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gpt-4o

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (84%) and CRMArena Function Calling overall_score_pct (82%)

28.8%
#2qwen-2.5-72b-instruct

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (78%) and EQ-Bench Leaderboard eq_bench_score (92%)

25.5%
#4Grok-4-0709
22.9%
#5gemini-2.5-pro
22.6%
#6gpt-4.1-20250414
22.6%
#10claude-sonnet-4-20250514
19.6%
#12gemini-3-pro-preview
17.7%
#13gemma-2-27b-it
16.6%
#14gemini-2.5-flash
16.4%
#15google/gemini-3.1-pro-preview
16.1%
#19gpt-5-2025-08-07
14.8%
#21openai/gpt-5.4-2026-03-05
14.5%
#23gpt-5.1-2025-11-13
14.1%
#24anthropic/claude-sonnet-4.6
14.0%
#26claude-opus-4-5-20251101
13.9%
#27xai-org/grok-4-fast-reasoning
13.9%
#29gpt-5-mini-2025-08-07
13.6%
#30gpt-4.1-mini-20250414
13.4%
#32anthropic/claude-opus-4-6-thinking
13.3%
#33gemini-3-flash-preview
13.2%
#34xai-org/grok-4-1-fast-reasoning
13.2%
#35gpt-5.2-2025-12-11
13.1%
#38anthropic/claude-opus-4-5-20251101-thinking
12.8%
#40gemma-7b-it
12.7%
#43Llama-2-7b-chat-hf
12.5%
#47kimi/kimi-k2.5-thinking
12.0%
#49anthropic/claude-sonnet-4-5-20250929-thinking
11.7%
#52Kimi-K2-Instruct
11.1%
#54o3-20250416
11.1%
#55google/gemini-3.1-flash-lite-preview
11.0%

Compare Models

Model A leads by +3.3%

Shareable Link →

Model A

gpt-4o

external/openai/gpt-4o

28.8%

Rank #1

Confidence 35.8%12 evidence pts

Creative Writing Official (EQ-Bench Slice): creative_writing_score

Value 84.4% · Conf 100.0% · Weight 4.6%

artificialanalysis_creative_writing_official.creative_writing_score (Mar 12, 2026)

CRMArena Function Calling: overall_score_pct

Value 82.1% · Conf 100.0% · Weight 4.0%

crmarena_leaderboard.overall_score_pct (Mar 12, 2026)

EQ-Bench Leaderboard: eq_bench_score

Value 96.7% · Conf 100.0% · Weight 4.0%

eq_bench.eq_bench_score (Mar 12, 2026)

Judgemark Official (EQ-Bench Slice): judgemark_score

Value 74.3% · Conf 100.0% · Weight 2.8%

artificialanalysis_judgemark_official.judgemark_score (Mar 12, 2026)

Model B

qwen-2.5-72b-instruct

external/qwen/qwen-2-5-72b-instruct

25.5%

Rank #2

Confidence 37.0%11 evidence pts

Creative Writing Official (EQ-Bench Slice): creative_writing_score

Value 78.4% · Conf 100.0% · Weight 4.3%

artificialanalysis_creative_writing_official.creative_writing_score (Mar 12, 2026)

EQ-Bench Leaderboard: eq_bench_score

Value 91.5% · Conf 100.0% · Weight 3.8%

eq_bench.eq_bench_score (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 64.1% · Conf 100.0% · Weight 3.0%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Judgemark Official (EQ-Bench Slice): judgemark_score

Value 55.6% · Conf 100.0% · Weight 2.1%

artificialanalysis_judgemark_official.judgemark_score (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

62

Sources

8

Quality

Insufficient

Vals Legal Bench

vals_legal_bench

42 rows

0.7% avg lift

Vals CorpFin v2

vals_corp_fin_v2

42 rows

0.6% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

41 rows

0.7% avg lift

Vals LiveCodeBench

vals_lcb

41 rows

0.6% avg lift

Missing Strong Models

No obvious gaps right now.

Taxonomy Details

Core Tasks

task.write_ad_variantstask.rewrite_tone_style

Required Modes

none

Domains

domain.marketing_sales

Related Use Cases