BasedAGIBasedAGI
Menu
Rankings live

creative

Best Model for NPC Dialogue

Compare models for low-latency in-character dialogue suitable for games.

#1 Recommendation

qwen-2.5-72b-instruct

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (78%) and Judgemark Official (EQ-Bench Slice) judgemark_score (56%)

external/qwen/qwen-2-5-72b-instruct

21.7%

Score

35.0%

Confidence

13

Evidence

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

Creative Writing Official (EQ-Bench Slice): creative_writing_score

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#12qwen-2.5-72b-instruct
21.7%
#16gpt-4o
20.7%
#24gemini-2.5-pro
18.5%
#35Grok-4-0709
17.1%
#40gpt-4.1-20250414
16.6%
#43Arch-Agent-32B
16.4%
#59gemma-2-27b-it
13.5%
#75xai-org/grok-4-fast-reasoning
12.1%
#82Arch-Agent-3B
11.6%
#86xai-org/grok-4-1-fast-reasoning
11.4%
#87gemini-3-pro-preview
11.4%
#90Arch-Agent-1.5B
11.1%
#94grok/grok-4.20-beta-0309-reasoning
10.9%
#96gemini-3-flash-preview
10.7%
#98x-ai/grok-3
10.6%
#100claude-sonnet-4-20250514
10.4%
#101google/gemini-3.1-pro-preview
10.3%
#104gemini-2.5-flash
10.2%
#114gpt-5-2025-08-07
9.5%
#116openai/gpt-5.4-2026-03-05
9.4%
#119gpt-5.1-2025-11-13
9.1%
#124anthropic/claude-sonnet-4.6
9.0%
#126claude-opus-4-5-20251101
9.0%
#130gpt-5-mini-2025-08-07
8.8%
#132xai-org/grok-4-1-fast-non-reasoning
8.7%
#133Kimi-K2-Instruct
8.7%
#136gpt-4o-2024-05-13
8.4%
#143xai-org/grok-4-fast-non-reasoning
8.2%
#148qwen/qwen3-max
7.9%
#154DeepSeek-V2.5
7.8%

Head-to-Head: #1 vs #2

#12

Top Pick

qwen-2.5-72b-instruct

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (78%) and Judgemark Official (EQ-Bench Slice) judgemark_score (56%)

21.7%

Conf 35.0%

#16

gpt-4o

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (84%) and Judgemark Official (EQ-Bench Slice) judgemark_score (74%)

20.7%

Conf 26.5%

Related Lookups