live
weekly refresh
basedagi.org
benchmark evidence

AI Language Proficiency Monitor Mandarin Chinese

Mean ARC, MGSM, and MMLU accuracy for Mandarin Chinese, from the AI Language Proficiency Monitor.

winner on AI Language Proficiency Monitor Mandarin Chinese
direct benchmark result, not a broad vertical composite | source row dated 2026-05-19
scored on 2026-05-19
latest mapped results | top 20
#ModelScoreEvidenceTested
1Anthropic: Claude Sonnet 4.6
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
2Anthropic: Claude Sonnet 4.5
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
3OpenAI: GPT-5
Openai
100.0
model-only
independent_benchmark
2026-05-19
4Anthropic: Claude Opus 4.6
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
5Anthropic: Claude Sonnet 4
Anthropic
100.0
model-only
independent_benchmark
2026-05-19
6OpenAI: GPT-5.2
Openai
100.0
model-only
independent_benchmark
2026-05-19
7Google: Gemini 2.5 Pro
Google
96.7
model-only
independent_benchmark
2026-05-19
8OpenAI: GPT-4.1
Openai
96.7
model-only
independent_benchmark
2026-05-19
9Anthropic: Claude Opus 4.5
Anthropic
96.7
model-only
independent_benchmark
2026-05-19
10OpenAI: GPT-5.1
Openai
96.7
model-only
independent_benchmark
2026-05-19
11OpenAI: GPT-5.4
Openai
96.7
model-only
independent_benchmark
2026-05-19
12Meta: Llama 3.3 70B Instruct
Meta Llama
93.3
model-only
independent_benchmark
2026-05-19
13Anthropic: Claude Haiku 4.5
Anthropic
93.3
model-only
independent_benchmark
2026-05-19
14Qwen: Qwen3 30B A3B
Qwen
92.6
model-only
independent_benchmark
2026-05-19
15Meta: Llama 4 Maverick
Meta Llama
90.0
model-only
independent_benchmark
2026-05-19
16Amazon: Nova Pro 1.0
Amazon
90.0
model-only
independent_benchmark
2026-05-19
17Mistral: Mistral Nemo
Mistralai
53.3
model-only
independent_benchmark
2026-05-19
18Google: Gemini 2.5 Flash
Google
43.3
model-only
independent_benchmark
2026-05-19
what this result means

Mean ARC, MGSM, and MMLU accuracy for Mandarin Chinese, from the AI Language Proficiency Monitor.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on AI Language Proficiency Monitor Mandarin Chinese. Broad task pages require independent corroboration before naming a general winner.

source record
category: multilingual
metric: accuracy
matched models: 18
latest source date: 2026-05-19
direction: higher is better
inspect upstream source ->