benchmark evidence

ScreenSpot-Pro

ScreenSpot-Pro GUI grounding micro-average accuracy.

winner on ScreenSpot-Pro

direct benchmark result, not a broad vertical composite | source row dated 2026-04-14

scored on 2026-04-14 · stale source data (91d)

latest mapped results | top 20

#	Model	Score	Evidence	Tested
1	Anthropic: Claude Haiku 4.5 Anthropic	17.0	model-only independent_benchmark	2026-04-14
2	OpenAI: GPT-5 Openai	0.8	model-only independent_benchmark	2026-04-14

what this result means

ScreenSpot-Pro GUI grounding micro-average accuracy.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on ScreenSpot-Pro. Broad task pages require independent corroboration before naming a general winner.

source record

category: computer_use

metric: accuracy

matched models: 2

latest source date: 2026-04-14

direction: higher is better