BasedAGIBasedAGI
Menu
Rankings live

adult

Adult ERP roleplay (explicit)

Explicit adult roleplay with boundary adherence and persona memory.

#1 Recommendation

Grok-4-0709

Strong on UGI Leaderboard Writing ✍️ (99%) and UGI Leaderboard Entertainment (100%)

external/xai/grok-4-0709

20.6%

Score

25.9%

Confidence

Limited benchmark evidence for this use case.

62 ranked models with average evidence of 11.2 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

79%

Scoring

Benchmark-backed

Top Signal

UGI Leaderboard: Writing ✍️

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#21Grok-4-0709
20.6%
#25gpt-4.1-20250414
19.8%
#30gemini-2.5-pro
19.3%
#33Arch-Agent-32B
18.5%
#45qwen-2.5-72b-instruct
16.3%
#54xai-org/grok-4-fast-reasoning
14.7%
#61xai-org/grok-4-1-fast-reasoning
13.8%
#69x-ai/grok-3
13.3%
#73Arch-Agent-3B
12.9%
#74gpt-4o
12.9%
#79Arch-Agent-1.5B
12.4%
#80gemini-3-pro-preview
12.3%
#81gpt-4o-2024-05-13
12.1%
#85google/gemini-3.1-pro-preview
11.2%
#86claude-sonnet-4-20250514
11.2%
#88Kimi-K2-Instruct
11.1%
#94xai-org/grok-4-1-fast-non-reasoning
10.7%
#97gpt-5-2025-08-07
10.3%
#98gemma-2-27b-it
10.3%
#100xai-org/grok-4-fast-non-reasoning
10.2%
#101openai/gpt-5.4-2026-03-05
10.1%
#105gpt-5.1-2025-11-13
9.9%
#108anthropic/claude-sonnet-4.6
9.8%
#109claude-opus-4-5-20251101
9.7%
#111qwen/qwen3-max
9.6%
#113deepseek-v3
9.5%
#114gpt-5-mini-2025-08-07
9.5%
#117anthropic/claude-opus-4-6-thinking
9.3%
#118gemini-3-flash-preview
9.3%
#123gpt-5.2-2025-12-11
9.1%

Compare Models

Model A leads by +0.8%

Shareable Link →

Model A

Grok-4-0709

external/xai/grok-4-0709

20.6%

Rank #21

Confidence 25.9%20 evidence pts

UGI Leaderboard: Writing ✍️

Value 99.2% · Conf 100.0% · Weight 3.8%

ugi_main.writing (Mar 12, 2026)

UGI Leaderboard: Entertainment

Value 100.0% · Conf 100.0% · Weight 3.4%

ugi_main.entertainment (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 84.6% · Conf 100.0% · Weight 1.3%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 56.5% · Conf 100.0% · Weight 1.2%

galileo_agent_v2.avg_ac (Mar 12, 2026)

Model B

gpt-4.1-20250414

external/openai/gpt-4-1-20250414

19.8%

Rank #25

Confidence 25.6%20 evidence pts

UGI Leaderboard: Writing ✍️

Value 100.0% · Conf 100.0% · Weight 3.8%

ugi_main.writing (Mar 12, 2026)

UGI Leaderboard: Entertainment

Value 73.3% · Conf 100.0% · Weight 2.5%

ugi_main.entertainment (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 100.0% · Conf 100.0% · Weight 2.1%

galileo_agent_v2.avg_ac (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 64.1% · Conf 100.0% · Weight 1.0%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

62

Sources

8

Quality

Insufficient

Vals Legal Bench

vals_legal_bench

36 rows

0.6% avg lift

Vals CorpFin v2

vals_corp_fin_v2

36 rows

0.5% avg lift

Vals MedQA

vals_medqa

35 rows

0.5% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

35 rows

0.5% avg lift

Missing Strong Models

zai/glm-5-thinking

external/zai/glm-5-thinking

Rank #32

13.0%

Thin evidence after weighting

alibaba/qwen3.5-flash

external/alibaba/qwen3-5-flash

Rank #33

12.3%

Thin evidence after weighting

Kimi K2 Thinking

external/kimi/kimi-k2-thinking

Rank #34

12.3%

Thin evidence after weighting

gpt-4o-20241120

external/openai/gpt-4o-20241120

Rank #49

10.7%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.adult_erotica_explicittask.persona_consistency

Required Modes

mode.persona_memory

Domains

domain.creative_writing

Related Use Cases