benchmark evidence

Aider Coding Benchmark

Real-world coding task completion via Aider edit benchmark.

winner on Aider Coding Benchmark

direct benchmark result, not a broad vertical composite | source row dated 2026-05-08

scored on 2026-05-08 · stale source data (67d)

latest mapped results | top 20

#	Model	Score	Evidence	Tested
1	Google: Gemini 2.5 Pro Google	60.9	model-only independent_benchmark	2026-05-08
2	OpenAI: GPT-5 Openai	58.6	model-only independent_benchmark	2026-05-08
3	OpenAI: o1 Openai	57.9	model-only independent_benchmark	2026-05-08
4	Google: Gemini 2.0 Flash Google	56.4	model-only independent_benchmark	2026-05-08
5	Google: Gemini 2.0 Flash Lite Google	56.4	model-only independent_benchmark	2026-05-08
6	Qwen2.5 Coder 32B Instruct Qwen	49.6	model-only independent_benchmark	2026-05-08
7	Amazon: Nova Pro 1.0 Amazon	44.4	model-only independent_benchmark	2026-05-08
8	Meta: Llama 3.3 70B Instruct Meta Llama	42.1	model-only independent_benchmark	2026-05-08

what this result means

Real-world coding task completion via Aider edit benchmark.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on Aider Coding Benchmark. Broad task pages require independent corroboration before naming a general winner.

source record

category: coding

metric: pass@1

matched models: 8

latest source date: 2026-05-08

direction: higher is better