live
weekly refresh
basedagi.org
benchmark evidence

Chatbot Arena (LMSYS)

Human-preference Bradley-Terry ratings from LMSYS Chatbot Arena. Community-voted pairwise comparisons.

winner on Chatbot Arena (LMSYS)
direct benchmark result, not a broad vertical composite | source row dated 2026-05-27
scored on 2026-05-27
latest mapped results | top 20
#ModelScoreEvidenceTested
1Anthropic: Claude Opus 4.6
Anthropic
99.6
model-only
independent_benchmark
2026-05-27
2Anthropic: Claude Opus 4.7
Anthropic
98.4
model-only
independent_benchmark
2026-05-27
3OpenAI: GPT-5.5
Openai
94.1
model-only
independent_benchmark
2026-05-27
4Anthropic: Claude Sonnet 4.6
Anthropic
92.4
model-only
independent_benchmark
2026-05-27
5OpenAI: GPT-5.4
Openai
92.3
model-only
independent_benchmark
2026-05-27
6Anthropic: Claude Opus 4.5
Anthropic
92.3
model-only
independent_benchmark
2026-05-27
7Z.ai: GLM 5
Z Ai
89.3
model-only
independent_benchmark
2026-05-27
8Anthropic: Claude Sonnet 4.5
Anthropic
88.8
model-only
independent_benchmark
2026-05-27
9DeepSeek: DeepSeek V4 Pro
Deepseek
88.5
model-only
independent_benchmark
2026-05-27
10OpenAI: GPT-5.4 Mini
Openai
87.9
model-only
independent_benchmark
2026-05-27
11Google: Gemini 2.5 Pro
Google
86.5
model-only
independent_benchmark
2026-05-27
12OpenAI: GPT-5.1
Openai
84.7
model-only
independent_benchmark
2026-05-27
13OpenAI: GPT-5.2
Openai
83.7
model-only
independent_benchmark
2026-05-27
14DeepSeek: DeepSeek V4 Flash
Deepseek
83.2
model-only
independent_benchmark
2026-05-27
15OpenAI: o3
Openai
82.8
model-only
independent_benchmark
2026-05-27
16OpenAI: GPT-5
Openai
81.6
model-only
independent_benchmark
2026-05-19
17DeepSeek: DeepSeek V3.2
Deepseek
81.1
model-only
independent_benchmark
2026-05-27
18MoonshotAI: Kimi K2 Thinking
Moonshotai
79.4
model-only
independent_benchmark
2026-05-19
19OpenAI: GPT-4.1
Openai
78.4
model-only
independent_benchmark
2026-05-27
20Google: Gemini 2.5 Flash
Google
77.7
model-only
independent_benchmark
2026-05-27
what this result means

Human-preference Bradley-Terry ratings from LMSYS Chatbot Arena. Community-voted pairwise comparisons.

Human preference is useful, but presentation style and familiarity can move arena votes.

A win here is a win on Chatbot Arena (LMSYS). Broad task pages require independent corroboration before naming a general winner.

source record
category: overall
metric: bradley_terry
matched models: 34
latest source date: 2026-05-27
direction: higher is better
inspect upstream source ->