live
weekly refresh
basedagi.org
benchmark evidence

WebDev Arena

Arena.ai WebDev / Code Arena frontier coding preference leaderboard.

winner on WebDev Arena
direct benchmark result, not a broad vertical composite | source row dated 2026-05-19
scored on 2026-05-19
latest mapped results | top 20
#ModelScoreEvidenceTested
1Anthropic: Claude Opus 4.7
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
2Anthropic: Claude Opus 4.6
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
3Anthropic: Claude Sonnet 4.6
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
4Anthropic: Claude Opus 4.5
Anthropic
91.7
model-only
independent_benchmark
2026-05-19
5Z.ai: GLM 5
Z Ai
83.8
model-only
independent_benchmark
2026-05-19
6OpenAI: GPT-5.2
Openai
76.1
model-only
independent_benchmark
2026-05-19
7OpenAI: GPT-5.4 Mini
Openai
75.5
model-only
independent_benchmark
2026-05-19
8OpenAI: GPT-5.4
Openai
71.5
model-only
independent_benchmark
2026-05-19
9Anthropic: Claude Sonnet 4.5
Anthropic
71.5
model-only
independent_benchmark
2026-05-19
10OpenAI: GPT-5.1
Openai
59.9
model-only
independent_benchmark
2026-05-19
11DeepSeek: DeepSeek V3.2
Deepseek
58.0
model-only
independent_benchmark
2026-05-19
12Anthropic: Claude Haiku 4.5
Anthropic
55.1
model-only
independent_benchmark
2026-05-19
13Google: Gemini 2.5 Pro
Google
25.8
model-only
independent_benchmark
2026-05-19
what this result means

Arena.ai WebDev / Code Arena frontier coding preference leaderboard.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on WebDev Arena. Broad task pages require independent corroboration before naming a general winner.

source record
category: coding
metric: accuracy
matched models: 13
latest source date: 2026-05-19
direction: higher is better
inspect upstream source ->