legal
Legal translation
Translate legal text with terminology consistency and format safety.
#1 Recommendation
gemini-2.5-flash
Strong on LanguageBench Translation Official (Split) translation_to:bleu (92%) and LanguageBench overall:mean (100%)
external/google/gemini-2-5-flash
31.1%
Score
35.1%
Confidence
Limited benchmark evidence for this use case.
54 ranked models with average evidence of 14.4 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
30
Evidence Quality
81%
Scoring
Benchmark-backed
Top Signal
LanguageBench Translation Official (Split): translation_to:bleu
All Ranked Models
Compare Models
Model A leads by +5.8%
Shareable Link →Model A
gemini-2.5-flash
external/google/gemini-2-5-flash
Rank #1
LanguageBench Translation Official (Split): translation_to:bleu
Value 92.0% · Conf 100.0% · Weight 5.5%
languagebench_translation_official.translation_to_bleu (Mar 12, 2026)
LanguageBench: overall:mean
Value 100.0% · Conf 100.0% · Weight 3.5%
languagebench.overall_mean (Mar 12, 2026)
LanguageBench Translation Official (Split): translation_to:chrf
Value 97.5% · Conf 100.0% · Weight 3.0%
languagebench_translation_official.translation_to_chrf (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg TSQ
Value 100.0% · Conf 100.0% · Weight 2.7%
galileo_agent_v2.avg_tsq (Mar 12, 2026)
Model B
gemini-2.5-pro
external/google/gemini-2-5-pro
Rank #4
LEXam Leaderboard: average_score_pct
Value 89.4% · Conf 100.0% · Weight 2.5%
lexam_leaderboard.average_score_pct (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg TSQ
Value 79.5% · Conf 100.0% · Weight 2.1%
galileo_agent_v2.avg_tsq (Mar 12, 2026)
Vals Case Law v2: overall_accuracy_pct
Value 63.2% · Conf 100.0% · Weight 2.1%
vals_case_law_v2.overall_accuracy_pct (Mar 12, 2026)
OpenVLM OCRBench Official: ocrbench_score_pct
Value 90.7% · Conf 100.0% · Weight 1.8%
openvlm_ocrbench_official.ocrbench_score_pct (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
54
Sources
8
Quality
Insufficient
Vals Legal Bench
vals_legal_bench
41 rows
2.4% avg lift
Vals MedQA
vals_medqa
38 rows
0.4% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
38 rows
0.4% avg lift
Vals LiveCodeBench
vals_lcb
37 rows
0.3% avg lift
Missing Strong Models
gpt-4o-2024-05-13
external/openai/gpt-4o-2024-05-13
Rank #51
10.5%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
legal
Contract Q&A (RAG grounded)
Answer contract questions grounded in the actual contract text.
Top: gemini-2.5-pro
legal
Regulatory summary
Summarize and compare regulatory text with conservative interpretation.
Top: gemini-2.5-pro
legal
Contract redline summary
Summarize material changes between contract versions with clause refs.
Top: gemini-2.5-pro
legal
Clause playbook check
Check extracted terms against a playbook and flag deviations.
Top: gemini-2.5-pro