BasedAGIBasedAGI
Menu
Rankings live

healthcare

Patient-friendly explanations

Rewrite technical notes into clear, accessible patient language.

#1 Recommendation

gemini-2.5-flash

Strong on LanguageBench Translation Official (Split) translation_to:bleu (92%) and BRIDGE Medical Leaderboard average_performance_pct (100%)

external/google/gemini-2-5-flash

29.2%

Score

35.9%

Confidence

Limited benchmark evidence for this use case.

52 ranked models with average evidence of 14.7 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

LanguageBench Translation Official (Split): translation_to:bleu

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-2.5-flash

Strong on LanguageBench Translation Official (Split) translation_to:bleu (92%) and BRIDGE Medical Leaderboard average_performance_pct (100%)

29.2%
#2gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Healthcare AC (100%) and Vals MedQA overall_accuracy_pct (90%)

23.8%
#3gemini-2.5-pro

Strong on Vectara HHEM Leaderboard medicine_hallucination_error_pct (93%) and OpenVLM OCRBench Official ocrbench_score_pct (91%)

22.0%
#6claude-sonnet-4-20250514
19.4%
#7google/gemini-2.0-flash-001
18.6%
#9gpt-4.1-mini-20250414
17.8%
#10gpt-4o
16.3%
#11gemini-3-pro-preview
16.1%
#12gpt-5-mini-2025-08-07
16.0%
#13Grok-4-0709
15.6%
#14google/gemini-3.1-pro-preview
15.2%
#15gpt-5-2025-08-07
15.0%
#16claude-opus-4-5-20251101
14.7%
#17openai/gpt-5.4-2026-03-05
13.8%
#18qwen-2.5-72b-instruct
13.6%
#19gemini-3-flash-preview
13.2%
#20gpt-5.1-2025-11-13
12.8%
#21Llama-3.1-70B-Instruct
12.2%
#23deepseek/deepseek-r1
12.0%
#25xai-org/grok-4-fast-reasoning
11.5%
#26anthropic/claude-opus-4-6-thinking
11.5%
#28anthropic/claude-opus-4-1-20250805
11.3%
#29anthropic/claude-opus-4-5-20251101-thinking
11.3%
#30gpt-5.2-2025-12-11
11.2%
#31anthropic/claude-sonnet-4.6
11.2%
#34xai-org/grok-4-1-fast-reasoning
10.6%
#35anthropic/claude-sonnet-4-5-20250929-thinking
10.4%
#37o3-20250416
10.1%
#38kimi/kimi-k2.5-thinking
9.9%
#47google/gemini-3.1-flash-lite-preview
8.9%

Compare Models

Model A leads by +5.4%

Shareable Link →

Model A

gemini-2.5-flash

external/google/gemini-2-5-flash

29.2%

Rank #1

Confidence 35.9%22 evidence pts

LanguageBench Translation Official (Split): translation_to:bleu

Value 92.0% · Conf 100.0% · Weight 3.6%

languagebench_translation_official.translation_to_bleu (Mar 12, 2026)

BRIDGE Medical Leaderboard: average_performance_pct

Value 100.0% · Conf 100.0% · Weight 3.1%

bridge_medical_leaderboard.average_performance_pct (Mar 12, 2026)

LanguageBench: overall:mean

Value 100.0% · Conf 100.0% · Weight 2.1%

languagebench.overall_mean (Mar 12, 2026)

Vals MedScribe: overall_accuracy_pct

Value 84.6% · Conf 100.0% · Weight 2.0%

vals_medscribe.overall_accuracy_pct (Mar 12, 2026)

Model B

gpt-4.1-20250414

external/openai/gpt-4-1-20250414

23.8%

Rank #2

Confidence 32.9%27 evidence pts

Galileo Agent Leaderboard v2: Healthcare AC

Value 100.0% · Conf 100.0% · Weight 2.8%

galileo_agent_v2.healthcare_ac (Mar 12, 2026)

Vals MedQA: overall_accuracy_pct

Value 90.0% · Conf 100.0% · Weight 2.6%

vals_medqa.overall_accuracy_pct (Mar 12, 2026)

Vectara HHEM Leaderboard: medicine_hallucination_error_pct

Value 96.2% · Conf 100.0% · Weight 1.9%

vectara_hhem_leaderboard.medicine_hallucination_error_pct (Mar 12, 2026)

OpenVLM OCRBench Official: ocrbench_score_pct

Value 87.7% · Conf 100.0% · Weight 1.7%

openvlm_ocrbench_official.ocrbench_score_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

52

Sources

8

Quality

Insufficient

Vals MedQA

vals_medqa

37 rows

2.5% avg lift

Vals Legal Bench

vals_legal_bench

35 rows

0.4% avg lift

Vals LiveCodeBench

vals_lcb

33 rows

0.3% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

32 rows

0.3% avg lift

Missing Strong Models

zai/glm-5-thinking

external/zai/glm-5-thinking

Rank #32

13.0%

Thin evidence after weighting

alibaba/qwen3.5-flash

external/alibaba/qwen3-5-flash

Rank #33

12.3%

Thin evidence after weighting

gpt-4o-20241120

external/openai/gpt-4o-20241120

Rank #49

10.7%

Thin evidence after weighting

qwen/qwen3-max

external/qwen/qwen3-max

Rank #55

10.3%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.rewrite_claritytask.translate_technical

Required Modes

mode.multilingual

Domains

domain.healthcare_clinical

Related Use Cases