benchmark evidence
AI Language Proficiency Monitor Spanish
Mean ARC, MGSM, and MMLU accuracy for Spanish, from the AI Language Proficiency Monitor.
winner on AI Language Proficiency Monitor Spanish
OpenAI: GPT-5.2100.0
direct benchmark result, not a broad vertical composite | source row dated 2026-05-19
scored on 2026-05-19
latest mapped results | top 20
| # | Model | Score | Evidence | Tested |
|---|---|---|---|---|
| 1 | OpenAI: GPT-5.2 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 2 | OpenAI: GPT-5.4 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 3 | Anthropic: Claude Opus 4.5 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 4 | Anthropic: Claude Sonnet 4 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 5 | Qwen: Qwen3 30B A3B | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 6 | OpenAI: GPT-5.1 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 7 | Anthropic: Claude Sonnet 4.6 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 8 | Anthropic: Claude Opus 4.6 | 100.0 | model-only independent_benchmark | 2026-05-19 |
| 9 | Amazon: Nova Pro 1.0 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 10 | Anthropic: Claude Haiku 4.5 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 11 | Anthropic: Claude Sonnet 4.5 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 12 | OpenAI: GPT-5 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 13 | Google: Gemini 2.5 Pro | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 14 | OpenAI: GPT-4.1 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 15 | Meta: Llama 4 Maverick | 90.0 | model-only independent_benchmark | 2026-05-19 |
| 16 | Meta: Llama 3.3 70B Instruct | 80.0 | model-only independent_benchmark | 2026-05-19 |
| 17 | Mistral: Mistral Nemo | 70.0 | model-only independent_benchmark | 2026-05-19 |
| 18 | Google: Gemini 2.5 Flash | 40.0 | model-only independent_benchmark | 2026-05-19 |
what this result means
Mean ARC, MGSM, and MMLU accuracy for Spanish, from the AI Language Proficiency Monitor.
This benchmark contributes direct public evidence. Read its scope before generalizing the result.
A win here is a win on AI Language Proficiency Monitor Spanish. Broad task pages require independent corroboration before naming a general winner.
source record
category: multilingual
metric: accuracy
matched models: 18
latest source date: 2026-05-19
direction: higher is better