business_productivity

Best Model for RAG Q&A With Citations

Find the best model for grounded Q&A with citations from internal knowledge bases.

Full Analysis Available

Benchmark methodology, patterns in the data, and deployment notes

This page is high-intent, but the current benchmark evidence for this use case is still limited. Treat the leader below as provisional.

Provisional leader

gemini-2.5-pro

Best current option from the available benchmark evidence, but not yet a strong winner claim.

external/google/gemini-2-5-pro

31.6%

Score

48.1%

Confidence

Evidence

$3.44

per 1M tokens

Runners-up:#2 gpt-5-2025-08-07 (30.4%)#3 gemini-3-pro-preview (27.0%)#4 google/gemini-3.1-pro-preview (26.7%)

Ranked Models

Evidence Quality

83%

Evidence Points

Top Signal

FACTS Benchmark Suite: facts_grounding_score_pct

Benchmark Sources

Last Updated

16h ago

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and BasedAGI KB Q&A Eval overall_score_pct	31.6%	48%	$3.44	FACTS Benchmark Suite·Apr 29, 2026BasedAGI KB Q&A Eval·Apr 29, 2026
🥈	gpt-5-2025-08-07 Strong on BasedAGI KB Q&A Eval overall_score_pct and FACTS Benchmark Suite facts_grounding_score_pct	30.4%	40%	—	BasedAGI KB Q&A Eval·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
🥉	gemini-3-pro-preview Strong on BasedAGI KB Q&A Eval overall_score_pct and Vals Finance Agent overall_accuracy_pct	27.0%	37%	$4.50	BasedAGI KB Q&A Eval·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#4	gemini-3.1-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite facts_search_score_pct	26.7%	31%	$4.50	Vals Finance Agent·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#5	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	26.2%	36%	$6.00	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#6	gpt-5-mini-2025-08-07 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	25.6%	46%	—	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#7	Grok-4-0709 Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct	23.9%	37%	—	Vals Finance Agent·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#8	gemini-3-flash-preview Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite average_score_pct	22.3%	31%	$1.13	Vals Finance Agent·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#9	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	21.0%	31%	$0.56	FACTS Benchmark Suite·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#10	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct	20.8%	25%	—	FACTS Benchmark Suite·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#11	claude-sonnet-4 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Galileo Agent Leaderboard v2 Avg TSQ	19.4%	30%	$6.00	Vectara HHEM Leaderboard·Apr 29, 2026Galileo Agent Leaderboard v2·Apr 29, 2026
#12	gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct	18.7%	28%	—	Vectara HHEM Leaderboard·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#13	gpt-5.4-2026-03-05 Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	18.4%	22%	—	Vals Finance Agent·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#14	gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	17.8%	27%	$0.17	FACTS Benchmark Suite·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#16	claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	16.6%	24%	—	FACTS Benchmark Suite·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#17	grok-4-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	16.1%	31%	$0.28	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#19	gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	15.9%	24%	—	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#20	grok-4-1-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	14.4%	23%	$0.28	Vals Finance Agent·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#21	o3-20250416 Strong on SciArena Leaderboard rating_elo and FACTS Benchmark Suite facts_search_score_pct	13.8%	21%	$3.50	SciArena Leaderboard·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#23	claude-opus-4-6-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	13.8%	15%	—	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#24	claude-opus-4.7 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	13.8%	15%	$10.00	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#25	qwen-2.5-72b-instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa	13.8%	23%	—	Open LLM Leaderboard MMLU-Pro·Apr 29, 2026Open LLM Leaderboard GPQA·Apr 29, 2026
#27	kimi-k2.5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	13.3%	27%	—	Vals Finance Agent·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#28	phi-4 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Open LLM Leaderboard GPQA gpqa	12.9%	20%	—	Vectara HHEM Leaderboard·Apr 29, 2026Open LLM Leaderboard GPQA·Apr 29, 2026
#33	grok-4-1-fast-non-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	11.6%	21%	$0.28	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#34	grok-4.20-0309-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	11.4%	15%	$3.00	Vals Finance Agent·Apr 29, 2026Vals Finance Agent·Apr 29, 2026
#42	claude-opus-4-1-20250805 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and FACTS Benchmark Suite facts_grounding_score_pct	10.2%	19%	—	Vectara HHEM Leaderboard·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#46	gpt-4o-2024-08-06 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	9.4%	20%	—	Vectara HHEM Leaderboard·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#48	gemini-2.5-flash-lite Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Galileo Agent Leaderboard v2 Avg TSQ	9.4%	12%	$0.17	Vectara HHEM Leaderboard·Apr 29, 2026Galileo Agent Leaderboard v2·Apr 29, 2026
#49	o4-mini Strong on Vals CorpFin v2 overall_accuracy_pct and SciArena Leaderboard rating_elo	9.3%	21%	$1.93	Vals CorpFin v2·Apr 29, 2026SciArena Leaderboard·Apr 29, 2026

Head-to-Head: #1 vs #2

Top Pick

gemini-2.5-pro

Strong on FACTS Benchmark Suite facts_grounding_score_pct and BasedAGI KB Q&A Eval overall_score_pct

31.6%

Conf 48.1%

gpt-5-2025-08-07

Strong on BasedAGI KB Q&A Eval overall_score_pct and FACTS Benchmark Suite facts_grounding_score_pct

30.4%

Conf 40.1%

Full Comparison with Benchmark Evidence →

Full Use-Case Page Browse All Use Cases How We Score

Related Lookups

Best LLM for Code Generation

Benchmark-backed ranking of models for generating correct, secure code from requirements.

Best LLM for Debugging

Find the top-ranked models for localizing bugs and proposing fixes with explanations.

Best LLM for Unit Test Generation

Ranked models for generating meaningful unit tests and edge cases from code.

Best LLM for Code Review

Compare models for automated PR review covering correctness, security, and maintainability.

Best LLM for Autonomous Coding

Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.

Best LLM for Function Calling

Compare models for reliable tool use, function selection, and multi-step API orchestration.