▸ vertical

What is the best LLM
across languages?

Ranked from two independent public multilingual suites: Artificial Analysis Global-MMLU-Lite and the AI Language Proficiency Monitor. The general ranking uses one aggregate row per source. It does not let ten language columns count as ten separate votes. A named winner also requires 20 callable models with current two-source coverage.

multilingual leaderboard →how scores work →

▸ no current multilingual performance winner published

No current winner is published: qualifying independent evidence is older than 30 days. The table below shows available corroborated evidence, not a publishable current winner.

▸ corroborated evidence · multilingual score

#	Model	Multilingual	Reasoning	Price/M
1	Anthropic: Claude Opus 4.6 Anthropic	94.9	-	$5.00/M
2	Anthropic: Claude Opus 4.5 Anthropic	93.7	71.4	$5.00/M
3	Anthropic: Claude Sonnet 4.6 Anthropic	93.6	67.8	$3.00/M
4	OpenAI: GPT-5.2 Openai	92.2	67.6	$1.75/M
5	Google: Gemini 2.5 Pro Google	92.0	53.2	$1.25/M
6	Anthropic: Claude Sonnet 4.5 Anthropic	92.0	42.9	$3.00/M
7	Anthropic: Claude Haiku 4.5 Anthropic	88.6	-	$1.00/M
8	Meta: Llama 4 Maverick Meta Llama	86.4	52.0	$0.20/M
9	Meta: Llama 3.3 70B Instruct Meta Llama	81.0	60.4	$0.10/M
10	Meta: Llama 4 Scout Meta Llama	77.1	-	$0.10/M
11	Google: Gemini 2.5 Flash Google	58.9	29.2	$0.30/M

▸ evidence used

Global-MMLU-Litecross-language knowledge · Artificial Analysis · independent

AI Language Proficiency MonitorARC + MGSM + MMLU · ten languages · independent

▸ scoring rule

The Language Proficiency Monitor row averages Arabic, Bengali, German, Spanish, French, Hindi, Japanese, Portuguese, Russian, and Mandarin Chinese.

Each included model must have the full ten-language, three-task panel. Missing tests do not quietly produce an inflated average.

Per-language scores are stored for the next niche pages. They are not duplicate weight in this broad ranking.

▸ rankings by language

Each page computes its own two-source composite. Russian is withheld until a second independent per-language source is available.

Arabic →Bengali →German →Spanish →French →Hindi →Japanese →Portuguese →Mandarin Chinese →

▸ frequently asked

What is the best multilingual LLM?

No current winner is published: qualifying independent evidence is older than 30 days.

How is the multilingual ranking calculated?

The composite uses one aggregate score from Artificial Analysis Global-MMLU-Lite and one from the independently published AI Language Proficiency Monitor. Per-language rows are kept for language-specific rankings, not multiplied into this broad score.

What is the best LLMacross languages?

What is the best multilingual LLM?

How is the multilingual ranking calculated?

What is the best LLM
across languages?