business_productivity

Best LLM for Meeting Summarization

Compare models for summarizing meeting transcripts into action items, decisions, and key points.

This page is high-intent, but the current benchmark evidence for this use case is still limited. Treat the leader below as provisional.

Provisional leader

gemini-2.5-flash

Best current option from the available benchmark evidence, but not yet a strong winner claim.

external/google/gemini-2-5-flash

19.6%

Score

26.9%

Confidence

Evidence

Runners-up:#2 gemini-2.5-pro (19.5%)#3 gpt-5-2025-08-07 (18.2%)#4 google/gemini-3.1-pro-preview (18.1%)

Ranked Models

Evidence Quality

84%

Evidence Points

Top Signal

LanguageBench Grammar/Clarity Official (Split): grammar_clarity_score_pct

Benchmark Sources

Last Updated

17h ago

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-2.5-flash Strong on LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct and FACTS Benchmark Suite facts_grounding_score_pct	19.6%	27%	—	LanguageBench Grammar/Clarity Official (Split)·Apr 30, 2026FACTS Benchmark Suite·Apr 30, 2026
🥈	gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	19.5%	31%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
🥉	gpt-5-2025-08-07 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct	18.2%	28%	—	FACTS Benchmark Suite·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#4	gemini-3.1-pro-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	18.1%	22%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#5	claude-sonnet-4 Strong on LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	18.1%	26%	—	LanguageBench Grammar/Clarity Official (Split)·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#6	gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and MMLongBench-Doc Leaderboard acc_score_pct	17.9%	28%	—	Vectara HHEM Leaderboard·Apr 30, 2026MMLongBench-Doc Leaderboard·Apr 30, 2026
#7	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct	16.3%	20%	—	FACTS Benchmark Suite·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#8	gpt-5-mini-2025-08-07 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	16.2%	26%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#9	Grok-4-0709 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Galileo Agent Leaderboard v2 Avg TSQ	16.0%	24%	—	FACTS Benchmark Suite·Apr 30, 2026Galileo Agent Leaderboard v2·Apr 30, 2026
#10	qwen-2.5-72b-instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	15.1%	22%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#11	gemini-3-pro-preview Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct	14.9%	22%	—	Vectara HHEM Leaderboard·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#12	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	14.9%	21%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#13	gemini-3-flash-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	14.9%	22%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#14	phi-4 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Open LLM Leaderboard GPQA gpqa	14.7%	22%	—	Vectara HHEM Leaderboard·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#15	claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	13.6%	20%	—	FACTS Benchmark Suite·Apr 30, 2026Vectara HHEM Leaderboard·Apr 30, 2026
#16	claude-sonnet-4.6 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct	13.5%	18%	—	Vectara HHEM Leaderboard·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#17	gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct	12.6%	15%	—	Vectara HHEM Leaderboard·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#18	claude-opus-4.7 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct	11.9%	15%	—	Vectara HHEM Leaderboard·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#19	Llama-3.3-70B-Instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct	11.6%	17%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026LanguageBench Grammar/Clarity Official (Split)·Apr 30, 2026
#21	Llama-3.1-70B-Instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct	11.3%	17%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026LanguageBench Grammar/Clarity Official (Split)·Apr 30, 2026
#22	grok-4-1-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Berkeley Function Calling Leaderboard (Overall) Non-Live AST Acc	11.1%	19%	—	Vals Finance Agent·Apr 30, 2026Berkeley Function Calling Leaderboard (Overall)·Apr 30, 2026
#23	gpt-5.1-2025-11-13 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct	10.8%	16%	—	FACTS Benchmark Suite·Apr 30, 2026Vals Finance Agent·Apr 30, 2026
#26	o3-20250416 Strong on Vals Mortgage Tax overall_accuracy_pct and SciArena Leaderboard rating_elo	10.0%	18%	—	Vals Mortgage Tax·Apr 30, 2026SciArena Leaderboard·Apr 30, 2026
#27	grok-4-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct	9.9%	22%	—	Vals Finance Agent·Apr 30, 2026FACTS Benchmark Suite·Apr 30, 2026
#28	gemini-2.5-flash-lite Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Galileo Agent Leaderboard v2 Avg TSQ	9.8%	14%	—	Vectara HHEM Leaderboard·Apr 30, 2026Galileo Agent Leaderboard v2·Apr 30, 2026
#30	Qwen2-72B-Instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	9.5%	14%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#32	Qwen2.5-32B-Instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	9.3%	14%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#33	MaziyarPanahi/calme-3.2-instruct-78b Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	9.3%	11%	—	Open LLM Leaderboard MMLU-Pro·Apr 30, 2026Open LLM Leaderboard GPQA·Apr 30, 2026
#34	Mistral-Large-Instruct-2411 Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct	9.3%	12%	—	Open LLM Leaderboard GPQA·Apr 30, 2026Open LLM Leaderboard MMLU-Pro·Apr 30, 2026
#35	gemma-2-27b-it Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct	9.3%	14%	—	Open LLM Leaderboard GPQA·Apr 30, 2026Open LLM Leaderboard MMLU-Pro·Apr 30, 2026

Head-to-Head: #1 vs #2

Top Pick

gemini-2.5-flash

Strong on LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct and FACTS Benchmark Suite facts_grounding_score_pct

19.6%

Conf 26.9%

gemini-2.5-pro

Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct

19.5%

Conf 31.5%

Full Comparison with Benchmark Evidence →

Full Use-Case Page Browse All Use Cases How We Score

Related Lookups

Best LLM for Code Generation

Benchmark-backed ranking of models for generating correct, secure code from requirements.

Best LLM for Debugging

Find the top-ranked models for localizing bugs and proposing fixes with explanations.

Best LLM for Unit Test Generation

Ranked models for generating meaningful unit tests and edge cases from code.

Best LLM for Code Review

Compare models for automated PR review covering correctness, security, and maintainability.

Best LLM for Autonomous Coding

Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.

Best LLM for Function Calling

Compare models for reliable tool use, function selection, and multi-step API orchestration.