biomed_science

Best LLM for Literature Review

Ranked models for synthesizing papers and guidelines with citations and uncertainty.

This page is high-intent, but the current benchmark evidence for this use case is still limited. Treat the leader below as provisional.

Provisional leader

google/gemini-3.1-pro-preview

Best current option from the available benchmark evidence, but not yet a strong winner claim.

external/google/gemini-3-1-pro-preview

30.2%

Score

35.0%

Confidence

Evidence

Runners-up:#2 gpt-5-2025-08-07 (25.8%)#3 gemini-2.5-pro (25.6%)#4 gemini-3-flash-preview (25.2%)

Ranked Models

Evidence Quality

83%

Evidence Points

Top Signal

FACTS Benchmark Suite: facts_grounding_score_pct

Benchmark Sources

Last Updated

8h ago

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-3.1-pro-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals GPQA overall_accuracy_pct	30.2%	35%	—	FACTS Benchmark Suite·Apr 30, 2026Vals GPQA·Apr 30, 2026
🥈	gpt-5-2025-08-07 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals GPQA overall_accuracy_pct	25.8%	35%	—	FACTS Benchmark Suite·Apr 30, 2026Vals GPQA·Apr 30, 2026
🥉	gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	25.6%	39%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#4	gemini-3-flash-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct	25.2%	33%	—	FACTS Benchmark Suite·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#5	gpt-5-mini-2025-08-07 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals GPQA overall_accuracy_pct	24.7%	40%	—	FACTS Benchmark Suite·Apr 30, 2026Vals GPQA·Apr 30, 2026
#6	gemini-3-pro-preview Strong on Vals GPQA overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	24.3%	33%	—	Vals GPQA·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#7	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals GPQA overall_accuracy_pct	23.7%	29%	—	FACTS Benchmark Suite·Apr 30, 2026Vals GPQA·Apr 30, 2026
#8	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	23.1%	34%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#9	gpt-4.1-20250414 Strong on MMLongBench-Doc Leaderboard acc_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	21.8%	35%	—	MMLongBench-Doc Leaderboard·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#10	claude-opus-4.7 Strong on Vals Finance Agent overall_accuracy_pct and Vals GPQA overall_accuracy_pct	21.2%	26%	—	Vals Finance Agent·Apr 30, 2026Vals GPQA·Apr 30, 2026
#11	claude-sonnet-4 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	21.0%	35%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#12	Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	20.8%	30%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#13	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	20.7%	26%	—	Vals Finance Agent·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#14	claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct	20.6%	31%	—	FACTS Benchmark Suite·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#15	gpt-5.4-2026-03-05 Strong on Vals GPQA overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	20.2%	25%	—	Vals GPQA·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#16	gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	18.6%	31%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#18	grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	17.6%	33%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#19	o3-20250416 Strong on Vals GPQA overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	17.4%	28%	—	Vals GPQA·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#20	gpt-5.1-2025-11-13 Strong on Vals GPQA overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	17.2%	25%	—	Vals GPQA·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#21	grok-4-1-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	16.9%	27%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#24	kimi-k2.5-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	15.4%	23%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#25	claude-opus-4-6-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	15.3%	17%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#27	phi-4 Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	14.5%	21%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#31	grok-4-1-fast-non-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	13.7%	26%	—	Vals Finance Agent·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#32	claude-opus-4-1-20250805 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	13.4%	25%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#36	o4-mini Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	12.4%	27%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#37	grok-4.20-0309-reasoning Strong on Vals GPQA overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	12.4%	17%	—	Vals GPQA·Apr 30, 2026Vals CorpFin v2·Apr 30, 2026
#38	qwen-2.5-72b-instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	12.2%	20%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#46	Kimi K2 Thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals GPQA overall_accuracy_pct	10.9%	18%	—	Vals CorpFin v2·Apr 30, 2026Vals GPQA·Apr 30, 2026
#47	claude-sonnet-4-5-20250929 Strong on Vals CorpFin v2 overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	10.5%	17%	—	Vals CorpFin v2·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026

Head-to-Head: #1 vs #2

Top Pick

google/gemini-3.1-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals GPQA overall_accuracy_pct

30.2%

Conf 35.0%

gpt-5-2025-08-07

Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals GPQA overall_accuracy_pct

25.8%

Conf 35.0%

Full Comparison with Benchmark Evidence →

Full Use-Case Page Browse All Use Cases How We Score

Related Lookups

Best LLM for Code Generation

Benchmark-backed ranking of models for generating correct, secure code from requirements.

Best LLM for Debugging

Find the top-ranked models for localizing bugs and proposing fixes with explanations.

Best LLM for Unit Test Generation

Ranked models for generating meaningful unit tests and edge cases from code.

Best LLM for Code Review

Compare models for automated PR review covering correctness, security, and maintainability.

Best LLM for Autonomous Coding

Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.

Best LLM for Function Calling

Compare models for reliable tool use, function selection, and multi-step API orchestration.