What is the best LLM
across languages?
Ranked from two independent public multilingual suites: Artificial Analysis Global-MMLU-Lite and the AI Language Proficiency Monitor. The general ranking uses one aggregate row per source. It does not let ten language columns count as ten separate votes. A named winner also requires 20 callable models with current two-source coverage.
The Language Proficiency Monitor row averages Arabic, Bengali, German, Spanish, French, Hindi, Japanese, Portuguese, Russian, and Mandarin Chinese.
Each included model must have the full ten-language, three-task panel. Missing tests do not quietly produce an inflated average.
Per-language scores are stored for the next niche pages. They are not duplicate weight in this broad ranking.
Each page computes its own two-source composite. Russian is withheld until a second independent per-language source is available.
What is the best multilingual LLM?
No current multilingual winner is published.
How is the multilingual ranking calculated?
The composite uses one aggregate score from Artificial Analysis Global-MMLU-Lite and one from the independently published AI Language Proficiency Monitor. Per-language rows are kept for language-specific rankings, not multiplied into this broad score.