benchmark evidence
AI Language Proficiency Monitor Portuguese
Mean ARC, MGSM, and MMLU accuracy for Portuguese, from the AI Language Proficiency Monitor.
winner on AI Language Proficiency Monitor Portuguese
OpenAI: GPT-5.296.7
direct benchmark result, not a broad vertical composite | source row dated 2026-05-19
scored on 2026-05-19
latest mapped results | top 20
| # | Model | Score | Evidence | Tested |
|---|---|---|---|---|
| 1 | OpenAI: GPT-5.2 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 2 | Anthropic: Claude Sonnet 4.5 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 3 | Anthropic: Claude Sonnet 4.6 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 4 | Anthropic: Claude Sonnet 4 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 5 | Anthropic: Claude Opus 4.6 | 96.7 | model-only independent_benchmark | 2026-05-19 |
| 6 | Google: Gemini 2.5 Pro | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 7 | Meta: Llama 3.3 70B Instruct | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 8 | Anthropic: Claude Opus 4.5 | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 9 | OpenAI: GPT-5.1 | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 10 | Anthropic: Claude Haiku 4.5 | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 11 | OpenAI: GPT-5 | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 12 | OpenAI: GPT-4.1 | 93.3 | model-only independent_benchmark | 2026-05-19 |
| 13 | OpenAI: GPT-5.4 | 93.0 | model-only independent_benchmark | 2026-05-19 |
| 14 | Amazon: Nova Pro 1.0 | 90.0 | model-only independent_benchmark | 2026-05-19 |
| 15 | Qwen: Qwen3 30B A3B | 89.6 | model-only independent_benchmark | 2026-05-19 |
| 16 | Meta: Llama 4 Maverick | 86.7 | model-only independent_benchmark | 2026-05-19 |
| 17 | Mistral: Mistral Nemo | 70.0 | model-only independent_benchmark | 2026-05-19 |
| 18 | Google: Gemini 2.5 Flash | 40.0 | model-only independent_benchmark | 2026-05-19 |
what this result means
Mean ARC, MGSM, and MMLU accuracy for Portuguese, from the AI Language Proficiency Monitor.
This benchmark contributes direct public evidence. Read its scope before generalizing the result.
A win here is a win on AI Language Proficiency Monitor Portuguese. Broad task pages require independent corroboration before naming a general winner.
source record
category: multilingual
metric: accuracy
matched models: 18
latest source date: 2026-05-19
direction: higher is better