Ticket triage and routing

Classify and prioritize tickets into queues with structured output.

task.intent_classificationtask.priority_routing

Evidence quality is currently limited for this use case. Rankings below are useful for exploration, not a strong winner claim.

Provisional leader

gpt-4.1-20250414

Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.

21.8%

Best benchmark score

31.7%

Confidence

All ranked models — top 3

🥇

gpt-4.1-20250414

21.8%

🥈

gemini-2.5-pro

21.4%

🥉

gemini-3.1-pro-preview

19.8%

Ranked Models

Evidence Quality

80%

Evidence Points

Top Signal

Galileo Agent Leaderboard v2: Avg AC

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gpt-4.1-20250414 Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct	21.8%	32%	—	Galileo Agent Leaderboard v2Vectara HHEM Leaderboard
🥈	gemini-2.5-pro Strong on Galileo Agent Leaderboard v2 Avg AC and OpenVLM OCRBench Official ocrbench_score_pct	21.4%	36%	$3.44	Galileo Agent Leaderboard v2OpenVLM OCRBench Official
🥉	gemini-3.1-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and Vals Finance Agent overall_accuracy_pct	19.8%	23%	$4.50	SimpleQA VerifiedVals Finance Agent
#4	claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg AC and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct	19.1%	27%	$6.00	Galileo Agent Leaderboard v2LanguageBench Grammar/Clarity Official (Split)
#5	gpt-5-2025-08-07 Strong on SciArena Leaderboard rating_elo and OpenVLM OCRBench Official ocrbench_score_pct	18.9%	25%	—	SciArena LeaderboardOpenVLM OCRBench Official
#6	gpt-5-mini-2025-08-07 Strong on OpenVLM OCRBench Official ocrbench_score_pct and Vals Finance Agent overall_accuracy_pct	18.3%	28%	—	OpenVLM OCRBench OfficialVals Finance Agent
#7	gemini-2.5-flash Strong on Galileo Agent Leaderboard v2 Avg AC and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct	17.4%	27%	$0.17	Galileo Agent Leaderboard v2LanguageBench Grammar/Clarity Official (Split)
#8	Grok-4-0709 Strong on Galileo Agent Leaderboard v2 Avg AC and Vals Finance Agent overall_accuracy_pct	16.2%	25%	—	Galileo Agent Leaderboard v2Vals Finance Agent
#9	gemini-3-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and Vals Finance Agent overall_accuracy_pct	14.6%	21%	$4.50	SimpleQA VerifiedVals Finance Agent
#10	Claude-3.5-Sonnet Strong on CRMArena Function Calling overall_score_pct and OpenVLM OCRBench Official ocrbench_score_pct	14.2%	19%	$6.00	CRMArena Function CallingOpenVLM OCRBench Official
#11	gpt-4.1-mini-20250414 Strong on Galileo Agent Leaderboard v2 Avg AC and OpenVLM OCRBench Official ocrbench_score_pct	14.1%	20%	—	Galileo Agent Leaderboard v2OpenVLM OCRBench Official
#12	gemini-3-flash-preview Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct	13.7%	20%	$1.13	Vectara HHEM LeaderboardVals Finance Agent
#13	gpt-4o Strong on CRMArena Function Calling overall_score_pct and OpenVLM OCRBench Official ocrbench_score_pct	13.6%	18%	$0.26	CRMArena Function CallingOpenVLM OCRBench Official
#14	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	13.5%	18%	$6.00	Vals Finance AgentVectara HHEM Leaderboard
#16	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct	13.1%	16%	—	FACTS Benchmark SuiteVals Finance Agent
#19	gemini-3.1-flash-lite-preview Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and FACTS Benchmark Suite facts_grounding_score_pct	12.8%	19%	$0.56	Vectara HHEM LeaderboardFACTS Benchmark Suite
#20	gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct	12.8%	17%	—	Vectara HHEM LeaderboardVals Finance Agent
#21	Qwen3-Embedding-4B Strong on MTEB STS & Summarization Proxy Official sts_score_pct and MTEB Classification Official classification_score_pct	12.6%	14%	—	MTEB STS & Summarization Proxy OfficialMTEB Classification Official
#30	gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct	11.9%	18%	—	Vals Finance AgentVals Case Law v2
#34	o3-20250416 Strong on SciArena Leaderboard rating_elo and SimpleQA Verified simpleqa_verified_score_pct	11.5%	17%	$3.50	SciArena LeaderboardSimpleQA Verified
#39	claude-opus-4-5-20251101 Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	11.0%	16%	—	Vectara HHEM LeaderboardVectara HHEM Leaderboard
#44	grok-4-fast-reasoning Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct	10.8%	22%	$0.28	Vectara HHEM LeaderboardVals Finance Agent
#47	qwen-2.5-72b-instruct Strong on Galileo Agent Leaderboard v2 Avg AC and DuckDB NSQL Leaderboard all_execution_accuracy	10.8%	16%	—	Galileo Agent Leaderboard v2DuckDB NSQL Leaderboard
#81	gpt-4.1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct	9.4%	12%	$3.50	DuckDB NSQL LeaderboardLanguageBench Grammar/Clarity Official (Split)
#87	claude-opus-4-6-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	9.0%	11%	—	Vals Finance AgentVals Finance Agent
#93	grok-4-1-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct	8.6%	15%	$0.28	Vals Finance AgentVectara HHEM Leaderboard
#98	claude-opus-4-5-20251101-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct	8.4%	11%	—	Vals Finance AgentVals Case Law v2
#100	gemini-2.5-flash-lite Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct	8.4%	11%	$0.17	Galileo Agent Leaderboard v2Vectara HHEM Leaderboard
#103	deepseek-v3 Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct	8.2%	14%	—	Galileo Agent Leaderboard v2Vectara HHEM Leaderboard
#104	kimi-k2.5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct	8.1%	13%	—	Vals Finance AgentVals Finance Agent

Compare Models

Select two different models above to compare their evidence side by side.

▶Ranking diagnostics & missing models

Source lift

Ranked

Sources

Quality

Low

Vals MedQA

33 rows · 0.3% avg lift

Vals Legal Bench

33 rows · 0.3% avg lift

Vals Tax Eval v2

31 rows · 0.3% avg lift

Vals LiveCodeBench

31 rows · 0.3% avg lift

Missing frontier models

No obvious gaps right now.

▶Taxonomy & task details

Core tasks

task.intent_classificationtask.priority_routing

Required modes

mode.json_schema

Domains

domain.customer_support

Related in CX

Agent-assist reply suggestions

Draft replies for human agents with tone and policy constraints.

Support dialogue agent

Multi-turn support conversations with escalation and policy awareness.

Support bot (RAG grounded)

Support chatbot grounded in docs with optional citations and escalation.

Customer feedback theme mining

Extract themes and trends from reviews, tickets, and surveys.