history_linguistics
Historical document summarization
Summarize historical documents into timelines and key entities.
#1 Recommendation
gemini-2.5-flash
Strong on LanguageBench overall:mean (100%) and LanguageBench Translation Official (Split) translation_to:bleu (92%)
external/google/gemini-2-5-flash
25.1%
Score
29.1%
Confidence
Limited benchmark evidence for this use case.
28 ranked models with average evidence of 13.9 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
28
Evidence Quality
80%
Scoring
Benchmark-backed
Top Signal
LanguageBench: overall:mean
All Ranked Models
Compare Models
Model A leads by +4.2%
Shareable Link →Model A
gemini-2.5-flash
external/google/gemini-2-5-flash
Rank #1
LanguageBench: overall:mean
Value 100.0% · Conf 100.0% · Weight 4.5%
languagebench.overall_mean (Mar 12, 2026)
LanguageBench Translation Official (Split): translation_to:bleu
Value 92.0% · Conf 100.0% · Weight 4.3%
languagebench_translation_official.translation_to_bleu (Mar 12, 2026)
LanguageBench: translation_to:bleu
Value 92.0% · Conf 100.0% · Weight 2.3%
languagebench.translation_to_bleu (Mar 12, 2026)
LanguageBench Translation Official (Split): translation_to:chrf
Value 97.5% · Conf 100.0% · Weight 1.8%
languagebench_translation_official.translation_to_chrf (Mar 12, 2026)
Model B
google/gemini-2.0-flash-001
external/google/gemini-2-0-flash-001
Rank #3
LanguageBench: overall:mean
Value 99.9% · Conf 100.0% · Weight 4.5%
languagebench.overall_mean (Mar 12, 2026)
LanguageBench Translation Official (Split): translation_to:bleu
Value 88.0% · Conf 100.0% · Weight 4.1%
languagebench_translation_official.translation_to_bleu (Mar 12, 2026)
LanguageBench: translation_to:bleu
Value 88.0% · Conf 100.0% · Weight 2.2%
languagebench.translation_to_bleu (Mar 12, 2026)
LanguageBench Translation Official (Split): translation_to:chrf
Value 93.3% · Conf 100.0% · Weight 1.7%
languagebench_translation_official.translation_to_chrf (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
28
Sources
8
Quality
Insufficient
Icelandic LLM Leaderboard
icelandic_llm_leaderboard
17 rows
1.2% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
16 rows
0.4% avg lift
Vals CorpFin v2
vals_corp_fin_v2
16 rows
0.3% avg lift
Vals MedQA
vals_medqa
15 rows
0.4% avg lift
Missing Strong Models
anthropic/claude-sonnet-4.6
external/anthropic/claude-sonnet-4-6
Rank #4
21.1%
gpt-5.2-2025-12-11
external/openai/gpt-5-2-2025-12-11
Rank #16
16.2%
anthropic/claude-opus-4-6-thinking
external/anthropic/claude-opus-4-6-thinking
Rank #17
16.1%
xai-org/grok-4-fast-reasoning
external/xai-org/grok-4-fast-reasoning
Rank #18
15.7%