adult
Adult ERP roleplay (explicit)
Explicit adult roleplay with boundary adherence and persona memory.
#1 Recommendation
Grok-4-0709
Strong on UGI Leaderboard Writing ✍️ (99%) and UGI Leaderboard Entertainment (100%)
external/xai/grok-4-0709
20.6%
Score
25.9%
Confidence
Limited benchmark evidence for this use case.
62 ranked models with average evidence of 11.2 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
30
Evidence Quality
79%
Scoring
Benchmark-backed
Top Signal
UGI Leaderboard: Writing ✍️
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| #21 | Grok-4-0709 | 20.6% |
| #25 | gpt-4.1-20250414 | 19.8% |
| #30 | gemini-2.5-pro | 19.3% |
| #33 | Arch-Agent-32B | 18.5% |
| #45 | qwen-2.5-72b-instruct | 16.3% |
| #54 | xai-org/grok-4-fast-reasoning | 14.7% |
| #61 | xai-org/grok-4-1-fast-reasoning | 13.8% |
| #69 | x-ai/grok-3 | 13.3% |
| #73 | Arch-Agent-3B | 12.9% |
| #74 | gpt-4o | 12.9% |
| #79 | Arch-Agent-1.5B | 12.4% |
| #80 | gemini-3-pro-preview | 12.3% |
| #81 | gpt-4o-2024-05-13 | 12.1% |
| #85 | google/gemini-3.1-pro-preview | 11.2% |
| #86 | claude-sonnet-4-20250514 | 11.2% |
| #88 | Kimi-K2-Instruct | 11.1% |
| #94 | xai-org/grok-4-1-fast-non-reasoning | 10.7% |
| #97 | gpt-5-2025-08-07 | 10.3% |
| #98 | gemma-2-27b-it | 10.3% |
| #100 | xai-org/grok-4-fast-non-reasoning | 10.2% |
| #101 | openai/gpt-5.4-2026-03-05 | 10.1% |
| #105 | gpt-5.1-2025-11-13 | 9.9% |
| #108 | anthropic/claude-sonnet-4.6 | 9.8% |
| #109 | claude-opus-4-5-20251101 | 9.7% |
| #111 | qwen/qwen3-max | 9.6% |
| #113 | deepseek-v3 | 9.5% |
| #114 | gpt-5-mini-2025-08-07 | 9.5% |
| #117 | anthropic/claude-opus-4-6-thinking | 9.3% |
| #118 | gemini-3-flash-preview | 9.3% |
| #123 | gpt-5.2-2025-12-11 | 9.1% |
Compare Models
Model A leads by +0.8%
Shareable Link →Model A
Grok-4-0709
external/xai/grok-4-0709
Rank #21
UGI Leaderboard: Writing ✍️
Value 99.2% · Conf 100.0% · Weight 3.8%
ugi_main.writing (Mar 12, 2026)
UGI Leaderboard: Entertainment
Value 100.0% · Conf 100.0% · Weight 3.4%
ugi_main.entertainment (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg TSQ
Value 84.6% · Conf 100.0% · Weight 1.3%
galileo_agent_v2.avg_tsq (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 56.5% · Conf 100.0% · Weight 1.2%
galileo_agent_v2.avg_ac (Mar 12, 2026)
Model B
gpt-4.1-20250414
external/openai/gpt-4-1-20250414
Rank #25
UGI Leaderboard: Writing ✍️
Value 100.0% · Conf 100.0% · Weight 3.8%
ugi_main.writing (Mar 12, 2026)
UGI Leaderboard: Entertainment
Value 73.3% · Conf 100.0% · Weight 2.5%
ugi_main.entertainment (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 100.0% · Conf 100.0% · Weight 2.1%
galileo_agent_v2.avg_ac (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg TSQ
Value 64.1% · Conf 100.0% · Weight 1.0%
galileo_agent_v2.avg_tsq (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
62
Sources
8
Quality
Insufficient
Vals Legal Bench
vals_legal_bench
36 rows
0.6% avg lift
Vals CorpFin v2
vals_corp_fin_v2
36 rows
0.5% avg lift
Vals MedQA
vals_medqa
35 rows
0.5% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
35 rows
0.5% avg lift
Missing Strong Models
zai/glm-5-thinking
external/zai/glm-5-thinking
Rank #32
13.0%
alibaba/qwen3.5-flash
external/alibaba/qwen3-5-flash
Rank #33
12.3%
Kimi K2 Thinking
external/kimi/kimi-k2-thinking
Rank #34
12.3%
gpt-4o-20241120
external/openai/gpt-4o-20241120
Rank #49
10.7%