Customer feedback theme mining

Extract themes and trends from reviews, tickets, and surveys.

task.multi_doc_synthesistask.dedupe_normalize_records

Evidence quality is currently limited for this use case. Rankings below are useful for exploration, not a strong winner claim.

Provisional leader

gemini-3.1-pro-preview

Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.

29.1%

Best benchmark score

33.3%

Confidence

All ranked models — top 3

🥇

gemini-3.1-pro-preview

29.1%

🥈

gemini-2.5-pro

25.7%

🥉

gpt-5-2025-08-07

24.0%

Ranked Models

Evidence Quality

82%

Evidence Points

Top Signal

SimpleQA Verified: simpleqa_verified_score_pct

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-3.1-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and Vals Finance Agent overall_accuracy_pct	29.1%	33%	$4.50	SimpleQA VerifiedVals Finance Agent
🥈	gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	25.7%	39%	$3.44	FACTS Benchmark SuiteVectara HHEM Leaderboard
🥉	gpt-5-2025-08-07 Strong on SciArena Leaderboard rating_elo and FACTS Benchmark Suite facts_grounding_score_pct	24.0%	32%	—	SciArena LeaderboardFACTS Benchmark Suite
#4	gpt-5-mini-2025-08-07 Strong on SciArena Leaderboard rating_elo and Vals Finance Agent overall_accuracy_pct	23.6%	38%	—	SciArena LeaderboardVals Finance Agent
#5	gemini-3-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and SciArena Leaderboard rating_elo	22.4%	32%	$4.50	SimpleQA VerifiedSciArena Leaderboard
#6	Grok-4-0709 Strong on Vals Finance Agent overall_accuracy_pct and SimpleQA Verified simpleqa_verified_score_pct	21.8%	33%	—	Vals Finance AgentSimpleQA Verified
#7	gemini-3-flash-preview Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	20.9%	30%	$1.13	Vals Finance AgentVectara HHEM Leaderboard
#8	gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Galileo Agent Leaderboard v2 Avg AC	20.3%	30%	—	Vectara HHEM LeaderboardGalileo Agent Leaderboard v2
#9	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct	20.0%	24%	—	FACTS Benchmark SuiteVals Finance Agent
#10	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	19.8%	26%	$6.00	Vals Finance AgentVectara HHEM Leaderboard
#11	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	19.8%	29%	$0.56	FACTS Benchmark SuiteVectara HHEM Leaderboard
#12	gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct	18.3%	24%	—	Vectara HHEM LeaderboardVals Finance Agent
#13	claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct	18.1%	29%	$6.00	Galileo Agent Leaderboard v2Vectara HHEM Leaderboard
#14	gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	17.1%	31%	$0.17	FACTS Benchmark SuiteVectara HHEM Leaderboard
#15	claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	17.1%	27%	—	FACTS Benchmark SuiteVectara HHEM Leaderboard
#16	gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	16.4%	27%	—	Vals Finance AgentVals Finance Agent
#17	grok-4-fast-reasoning Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct	16.3%	33%	$0.28	Vectara HHEM LeaderboardVals Finance Agent
#18	o3-20250416 Strong on SciArena Leaderboard rating_elo and SimpleQA Verified simpleqa_verified_score_pct	16.3%	26%	$3.50	SciArena LeaderboardSimpleQA Verified
#19	grok-4-1-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	13.8%	22%	$0.28	Vals Finance AgentVectara HHEM Leaderboard
#20	grok-3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	12.9%	19%	$6.00	Vectara HHEM LeaderboardVectara HHEM Leaderboard
#21	claude-opus-4-6-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	12.4%	14%	—	Vals Finance AgentVals Finance Agent
#22	Claude-3.5-Sonnet Strong on LLM-AggreFact Leaderboard average_score_pct and LLM-AggreFact Leaderboard rag_truth_score_pct	11.9%	15%	$6.00	LLM-AggreFact LeaderboardLLM-AggreFact Leaderboard
#25	kimi-k2.5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	11.6%	17%	—	Vals Finance AgentVals CorpFin v2
#26	Qwen3-Embedding-4B Strong on MTEB Retrieval and Rerank (Official) retrieval_score_pct and BEIR-Style Retrieval (Official MTEB Slice) beir_average_score_pct	11.6%	13%	—	MTEB Retrieval and Rerank (Official)BEIR-Style Retrieval (Official MTEB Slice)
#30	claude-opus-4-5-20251101-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	11.4%	14%	—	Vals Finance AgentVals CorpFin v2
#32	qwen-2.5-72b-instruct Strong on Galileo Agent Leaderboard v2 Avg AC and LLM-AggreFact Leaderboard average_score_pct	10.8%	18%	—	Galileo Agent Leaderboard v2LLM-AggreFact Leaderboard
#33	claude-sonnet-4-5-20250929-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	10.8%	14%	—	Vals Finance AgentVals Finance Agent
#34	grok-4-1-fast-non-reasoning Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct	10.8%	20%	$0.28	Vectara HHEM LeaderboardVals Finance Agent
#37	claude-opus-4-1-20250805 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and FACTS Benchmark Suite facts_grounding_score_pct	10.7%	22%	—	Vectara HHEM LeaderboardFACTS Benchmark Suite
#40	glm-5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	10.5%	16%	—	Vals Finance AgentVals CorpFin v2

Compare Models

Select two different models above to compare their evidence side by side.

▶Ranking diagnostics & missing models

Source lift

Ranked

Sources

Quality

Low

Vals CorpFin v2

43 rows · 1.0% avg lift

Vals Legal Bench

33 rows · 0.2% avg lift

Vals MedQA

32 rows · 0.2% avg lift

Vals Tax Eval v2

32 rows · 0.2% avg lift

Missing frontier models

No obvious gaps right now.

▶Taxonomy & task details

Core tasks

task.multi_doc_synthesistask.dedupe_normalize_records

Required modes

mode.long_context

Domains

domain.customer_support

Related in CX

Agent-assist reply suggestions

Draft replies for human agents with tone and policy constraints.

Support dialogue agent

Multi-turn support conversations with escalation and policy awareness.

Support bot (RAG grounded)

Support chatbot grounded in docs with optional citations and escalation.

Support FAQ bot

Answer common support questions with safe troubleshooting steps.