adult
Adult erotica (long-form, explicit)
Long-form explicit erotica with controllable style and strict boundaries.
#1 Recommendation
qwen-2.5-72b-instruct
Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (78%) and Judgemark Official (EQ-Bench Slice) judgemark_score (56%)
external/qwen/qwen-2-5-72b-instruct
18.3%
Score
30.0%
Confidence
Limited benchmark evidence for this use case.
34 ranked models with average evidence of 11.7 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
30
Evidence Quality
80%
Scoring
Benchmark-backed
Top Signal
Creative Writing Official (EQ-Bench Slice): creative_writing_score
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| #19 | qwen-2.5-72b-instruct | 18.3% |
| #23 | gpt-4.1-20250414 | 18.0% |
| #32 | Grok-4-0709 | 16.7% |
| #35 | Arch-Agent-32B | 16.5% |
| #37 | gpt-4o | 16.3% |
| #40 | gemini-2.5-pro | 15.7% |
| #64 | gemma-2-27b-it | 12.1% |
| #68 | xai-org/grok-4-fast-reasoning | 11.9% |
| #74 | Arch-Agent-3B | 11.4% |
| #76 | xai-org/grok-4-1-fast-reasoning | 11.2% |
| #81 | Arch-Agent-1.5B | 10.8% |
| #83 | x-ai/grok-3 | 10.8% |
| #92 | gemini-3-pro-preview | 10.0% |
| #95 | gpt-4o-2024-05-13 | 9.8% |
| #104 | google/gemini-3.1-pro-preview | 9.1% |
| #105 | claude-sonnet-4-20250514 | 9.1% |
| #106 | Kimi-K2-Instruct | 9.0% |
| #113 | xai-org/grok-4-1-fast-non-reasoning | 8.7% |
| #116 | gpt-5-2025-08-07 | 8.4% |
| #118 | xai-org/grok-4-fast-non-reasoning | 8.3% |
| #124 | gpt-5.1-2025-11-13 | 8.0% |
| #127 | claude-opus-4-5-20251101 | 7.9% |
| #130 | qwen/qwen3-max | 7.8% |
| #131 | deepseek-v3 | 7.7% |
| #132 | gpt-5-mini-2025-08-07 | 7.7% |
| #143 | gemini-2.5-flash | 7.2% |
| #148 | GLM-4.5-Air | 6.7% |
| #158 | Llama-2-7b-chat-hf | 5.2% |
| #159 | gemma-7b-it | 5.1% |
| #161 | Llama-4-Scout-17B-16E-Instruct | 4.9% |
Compare Models
Model A leads by +0.3%
Shareable Link →Model A
qwen-2.5-72b-instruct
external/qwen/qwen-2-5-72b-instruct
Rank #19
Creative Writing Official (EQ-Bench Slice): creative_writing_score
Value 78.4% · Conf 100.0% · Weight 5.0%
artificialanalysis_creative_writing_official.creative_writing_score (Mar 12, 2026)
Judgemark Official (EQ-Bench Slice): judgemark_score
Value 55.6% · Conf 100.0% · Weight 2.2%
artificialanalysis_judgemark_official.judgemark_score (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 76.1% · Conf 100.0% · Weight 1.3%
galileo_agent_v2.avg_ac (Mar 12, 2026)
UGI Leaderboard: Writing ✍️
Value 41.8% · Conf 100.0% · Weight 1.3%
ugi_main.writing (Mar 12, 2026)
Model B
gpt-4.1-20250414
external/openai/gpt-4-1-20250414
Rank #23
UGI Leaderboard: Writing ✍️
Value 100.0% · Conf 100.0% · Weight 3.0%
ugi_main.writing (Mar 12, 2026)
UGI Leaderboard: Entertainment
Value 73.3% · Conf 100.0% · Weight 2.0%
ugi_main.entertainment (Mar 12, 2026)
MMLongBench-Doc Leaderboard: acc_score_pct
Value 74.6% · Conf 100.0% · Weight 1.7%
mmlongbench_doc_leaderboard.acc_score_pct (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 100.0% · Conf 100.0% · Weight 1.7%
galileo_agent_v2.avg_ac (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
34
Sources
8
Quality
Insufficient
UGI Leaderboard
ugi_main
19 rows
1.7% avg lift
Vals Legal Bench
vals_legal_bench
17 rows
0.4% avg lift
Vals CorpFin v2
vals_corp_fin_v2
17 rows
0.4% avg lift
Vals MedQA
vals_medqa
16 rows
0.4% avg lift
Missing Strong Models
anthropic/claude-sonnet-4.6
external/anthropic/claude-sonnet-4-6
Rank #4
21.1%
openai/gpt-5.4-2026-03-05
external/openai/gpt-5-4-2026-03-05
Rank #10
18.9%
gemini-3-flash-preview
external/google/gemini-3-flash-preview
Rank #15
16.2%
gpt-5.2-2025-12-11
external/openai/gpt-5-2-2025-12-11
Rank #16
16.2%