BasedAGIBasedAGI

customer_experience

Best LLM for Ticket Triage

Compare models for classifying and prioritizing support tickets with structured output.

#1 Recommendation

qwen-2.5-72b-instruct

Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

external/qwen/qwen-2-5-72b-instruct

20.7%

Score

28.7%

Confidence

18

Evidence

Ranked Models

30

Evidence Quality

89%

Evidence Points

18

Top Signal

Open LLM Leaderboard IFEval: ifeval

Benchmark Sources

39

Last Updated

7h ago

All Ranked Models

30 of 30 models
RankModelScore
🥇qwen-2.5-72b-instruct

Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

20.7%
🥈gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct

20.6%
🥉gemini-2.5-pro

Strong on Galileo Agent Leaderboard v2 Avg AC and OpenVLM OCRBench Official ocrbench_score_pct

18.5%
#4gemini-2.5-flash

Strong on Galileo Agent Leaderboard v2 Avg AC and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct

16.6%
#5claude-sonnet-4

Strong on Galileo Agent Leaderboard v2 Avg AC and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct

16.5%
#6gpt-5-2025-08-07

Strong on SciArena Leaderboard rating_elo and OpenVLM OCRBench Official ocrbench_score_pct

16.2%
#7gpt-5-mini-2025-08-07

Strong on OpenVLM OCRBench Official ocrbench_score_pct and Vals Finance Agent overall_accuracy_pct

15.7%
#8gemini-3.1-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite average_score_pct

15.0%
#9Grok-4-0709

Strong on Galileo Agent Leaderboard v2 Avg AC and Vals Finance Agent overall_accuracy_pct

15.0%
#10Llama-3.3-70B-Instruct

Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

14.5%
#11Llama-3.1-70B-Instruct

Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

14.2%
#12phi-4

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

13.8%
#13gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct and SciArena Leaderboard rating_elo

13.2%
#14gpt-5.2-2025-12-11

Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct

13.0%
#15MaziyarPanahi/calme-3.2-instruct-78b

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

13.0%
#16Qwen2.5-32B-Instruct

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard IFEval ifeval

12.9%
#18Mistral-Large-Instruct-2411

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

12.8%
#19MaziyarPanahi/calme-3.1-instruct-78b

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard IFEval ifeval

12.8%
#20CalmeRys-78B-Orpo-v0.1

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard IFEval ifeval

12.7%
#21MaziyarPanahi/calme-2.4-rys-78b

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

12.7%
#22gemma-2-27b-it

Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard GPQA gpqa

12.5%
#23gpt-4o

Strong on CRMArena Function Calling overall_score_pct and OpenVLM OCRBench Official ocrbench_score_pct

12.5%
#24Steelskull/L3.3-MS-Nevoria-70b

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

12.5%
#25gemini-3-flash-preview

Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct

12.3%
#26Homer-v1.0-Qwen2.5-72B

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

12.2%
#27Triangle104/Set-70b

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

12.2%
#28gpt-4.1-mini-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC and OpenVLM OCRBench Official ocrbench_score_pct

12.2%
#29Tarek07/Progenitor-V1.1-LLaMa-70B

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

12.1%
#30shuttle-3

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

12.1%
#31T3Q-qwen2.5-14b-v1.0-e3

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

12.0%

Head-to-Head: #1 vs #2

#1

Top Pick

qwen-2.5-72b-instruct

Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

20.7%

Conf 28.7%

#2

gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct

20.6%

Conf 29.7%

Related Lookups