Healthcare

Patient-friendly explanations

Rewrite technical notes into clear, accessible patient language.

task.rewrite_claritytask.translate_technical

Evidence quality is currently limited for this use case. Rankings below are useful for exploration, not a strong winner claim.

Provisional leader

claude-sonnet-4

Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.

29.2%

Best benchmark score

37.8%

Confidence

All ranked models — top 3

🥇

claude-sonnet-4

29.2%

🥈

gemini-2.5-flash

29.2%

🥉

gpt-4.1-20250414

23.0%

Ranked Models

Evidence Quality

81%

Evidence Points

Top Signal

LanguageBench Translation Official (Split): translation_to:bleu

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	claude-sonnet-4 Strong on LanguageBench Translation Official (Split) translation_to:bleu and Galileo Agent Leaderboard v2 Healthcare AC	29.2%	38%	$6.00	LanguageBench Translation Official (Split)Galileo Agent Leaderboard v2
🥈	gemini-2.5-flash Strong on LanguageBench Translation Official (Split) translation_to:bleu and BRIDGE Medical Leaderboard average_performance_pct	29.2%	37%	$0.17	LanguageBench Translation Official (Split)BRIDGE Medical Leaderboard
🥉	gpt-4.1-20250414 Strong on Galileo Agent Leaderboard v2 Healthcare AC and Vals MedQA overall_accuracy_pct	23.0%	32%	—	Galileo Agent Leaderboard v2Vals MedQA
#4	gemini-2.5-pro Strong on Vectara HHEM Leaderboard medicine_hallucination_error_pct and OpenVLM OCRBench Official ocrbench_score_pct	22.6%	49%	$3.44	Vectara HHEM LeaderboardOpenVLM OCRBench Official
#6	Claude-3.5-Sonnet Strong on LanguageBench Translation Official (Split) translation_to:bleu and MedHELM average_score_pct	21.0%	31%	$6.00	LanguageBench Translation Official (Split)MedHELM
#7	gpt-5-mini-2025-08-07 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	21.0%	36%	—	Vals MedQAVals MedScribe
#8	gpt-5-2025-08-07 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	20.1%	33%	—	Vals MedQAVals MedScribe
#9	gemini-3.1-pro-preview Strong on Vals MedQA overall_accuracy_pct and Vectara HHEM Leaderboard medicine_hallucination_error_pct	18.6%	21%	$4.50	Vals MedQAVectara HHEM Leaderboard
#11	gemini-2.0-flash-001 Strong on LanguageBench Translation Official (Split) translation_to:bleu and BRIDGE Medical Leaderboard average_performance_pct	18.2%	22%	—	LanguageBench Translation Official (Split)BRIDGE Medical Leaderboard
#12	gpt-4.1-mini-20250414 Strong on Galileo Agent Leaderboard v2 Healthcare AC and Vals MedQA overall_accuracy_pct	17.3%	24%	—	Galileo Agent Leaderboard v2Vals MedQA
#13	Grok-4-0709 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	16.6%	25%	—	Vals MedQAVals MedScribe
#14	gpt-4o Strong on MedHELM average_score_pct and MedHELM clinical_note_generation_win_rate_pct	16.0%	19%	$0.26	MedHELMMedHELM
#15	gemini-3-pro-preview Strong on Vals MedQA overall_accuracy_pct and Vectara HHEM Leaderboard medicine_hallucination_error_pct	15.0%	19%	$4.50	Vals MedQAVectara HHEM Leaderboard
#16	claude-opus-4-5-20251101 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	14.7%	19%	—	Vals MedQAVals MedScribe
#17	gemini-3-flash-preview Strong on Vals MedQA overall_accuracy_pct and Vectara HHEM Leaderboard medicine_hallucination_error_pct	14.6%	19%	$1.13	Vals MedQAVectara HHEM Leaderboard
#18	gpt-5.4-2026-03-05 Strong on Vals MedQA overall_accuracy_pct and Vectara HHEM Leaderboard medicine_hallucination_error_pct	14.0%	18%	—	Vals MedQAVectara HHEM Leaderboard
#19	qwen-2.5-72b-instruct Strong on BRIDGE Medical Leaderboard average_performance_pct and Galileo Agent Leaderboard v2 Healthcare AC	13.3%	20%	—	BRIDGE Medical LeaderboardGalileo Agent Leaderboard v2
#20	gpt-5.1-2025-11-13 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	13.1%	16%	—	Vals MedQAVals MedScribe
#21	gpt-5.2-2025-12-11 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	13.1%	15%	—	Vals MedQAVals MedScribe
#22	o3-20250416 Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	12.8%	18%	$3.50	Vals MedQAVals MedScribe
#23	deepseek-r1 Strong on BRIDGE Medical Leaderboard average_performance_pct and LanguageBench Translation Official (Split) translation_to:bleu	12.6%	31%	$0.27	BRIDGE Medical LeaderboardLanguageBench Translation Official (Split)
#24	grok-4-fast-reasoning Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	12.1%	21%	$0.28	Vals MedQAVals MedScribe
#25	gpt-4.1 Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean	12.0%	14%	$3.50	LanguageBench Translation Official (Split)LanguageBench
#26	gemini-2.5-flash-lite Strong on Galileo Agent Leaderboard v2 Healthcare AC and Vectara HHEM Leaderboard medicine_hallucination_error_pct	11.8%	17%	$0.17	Galileo Agent Leaderboard v2Vectara HHEM Leaderboard
#27	claude-opus-4-1-20250805 Strong on Vals MedQA overall_accuracy_pct and Vectara HHEM Leaderboard medicine_hallucination_error_pct	11.6%	18%	—	Vals MedQAVectara HHEM Leaderboard
#30	claude-opus-4-6-thinking Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	11.2%	12%	—	Vals MedQAVals MedScribe
#31	claude-opus-4-5-20251101-thinking Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	11.0%	12%	—	Vals MedQAVals MedScribe
#32	claude-sonnet-4.6 Strong on Vals MedQA overall_accuracy_pct and Vectara HHEM Leaderboard medicine_hallucination_error_pct	11.0%	15%	$6.00	Vals MedQAVectara HHEM Leaderboard
#34	gemini-3.1-flash-lite-preview Strong on Vectara HHEM Leaderboard medicine_hallucination_error_pct and FACTS Benchmark Suite facts_grounding_score_pct	10.5%	16%	$0.56	Vectara HHEM LeaderboardFACTS Benchmark Suite
#35	grok-4-1-fast-reasoning Strong on Vals MedQA overall_accuracy_pct and Vals MedScribe overall_accuracy_pct	10.2%	17%	$0.28	Vals MedQAVals MedScribe

Compare Models

Select two different models above to compare their evidence side by side.

▶Ranking diagnostics & missing models

Source lift

Ranked

Sources

Quality

Low

Vals MedQA

39 rows · 2.4% avg lift

Vals LiveCodeBench

34 rows · 0.3% avg lift

Vals Legal Bench

33 rows · 0.4% avg lift

Vals Tax Eval v2

33 rows · 0.3% avg lift

Missing frontier models

No obvious gaps right now.

▶Taxonomy & task details

Core tasks

task.rewrite_claritytask.translate_technical

Required modes

mode.multilingual

Domains

domain.healthcare_clinical

Related in Healthcare

Patient education bot (RAG grounded)

Answer patient FAQ using trusted sources with cautious wording.

Medical coding support (suggestions)

Extract coding-relevant facts and suggest codes for human review.

Clinical note drafting

Summarize encounters into structured notes for clinician review.

Medical chart summary

Summarize a patient's chart into timeline, problems, and meds for review.