customer_experience
Best LLM for Ticket Triage
Compare models for classifying and prioritizing support tickets with structured output.
#1 Recommendation
qwen-2.5-72b-instruct
Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct
external/qwen/qwen-2-5-72b-instruct
20.7%
Score
28.7%
Confidence
18
Evidence
Ranked Models
30
Evidence Quality
89%
Evidence Points
18
Top Signal
Open LLM Leaderboard IFEval: ifeval
Benchmark Sources
39
Last Updated
7h ago
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| 🥇 | qwen-2.5-72b-instruct Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 20.7% |
| 🥈 | gpt-4.1-20250414 Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct | 20.6% |
| 🥉 | gemini-2.5-pro Strong on Galileo Agent Leaderboard v2 Avg AC and OpenVLM OCRBench Official ocrbench_score_pct | 18.5% |
| #4 | gemini-2.5-flash Strong on Galileo Agent Leaderboard v2 Avg AC and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct | 16.6% |
| #5 | claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg AC and LanguageBench Grammar/Clarity Official (Split) grammar_clarity_score_pct | 16.5% |
| #6 | gpt-5-2025-08-07 Strong on SciArena Leaderboard rating_elo and OpenVLM OCRBench Official ocrbench_score_pct | 16.2% |
| #7 | gpt-5-mini-2025-08-07 Strong on OpenVLM OCRBench Official ocrbench_score_pct and Vals Finance Agent overall_accuracy_pct | 15.7% |
| #8 | gemini-3.1-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite average_score_pct | 15.0% |
| #9 | Grok-4-0709 Strong on Galileo Agent Leaderboard v2 Avg AC and Vals Finance Agent overall_accuracy_pct | 15.0% |
| #10 | Llama-3.3-70B-Instruct Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 14.5% |
| #11 | Llama-3.1-70B-Instruct Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 14.2% |
| #12 | phi-4 Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 13.8% |
| #13 | gemini-3-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and SciArena Leaderboard rating_elo | 13.2% |
| #14 | gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct | 13.0% |
| #15 | MaziyarPanahi/calme-3.2-instruct-78b Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa | 13.0% |
| #16 | Qwen2.5-32B-Instruct Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard IFEval ifeval | 12.9% |
| #18 | Mistral-Large-Instruct-2411 Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 12.8% |
| #19 | MaziyarPanahi/calme-3.1-instruct-78b Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard IFEval ifeval | 12.8% |
| #20 | CalmeRys-78B-Orpo-v0.1 Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard IFEval ifeval | 12.7% |
| #21 | MaziyarPanahi/calme-2.4-rys-78b Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa | 12.7% |
| #22 | gemma-2-27b-it Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard GPQA gpqa | 12.5% |
| #23 | gpt-4o Strong on CRMArena Function Calling overall_score_pct and OpenVLM OCRBench Official ocrbench_score_pct | 12.5% |
| #24 | Steelskull/L3.3-MS-Nevoria-70b Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 12.5% |
| #25 | gemini-3-flash-preview Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct | 12.3% |
| #26 | Homer-v1.0-Qwen2.5-72B Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa | 12.2% |
| #27 | Triangle104/Set-70b Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 12.2% |
| #28 | gpt-4.1-mini-20250414 Strong on Galileo Agent Leaderboard v2 Avg AC and OpenVLM OCRBench Official ocrbench_score_pct | 12.2% |
| #29 | Tarek07/Progenitor-V1.1-LLaMa-70B Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 12.1% |
| #30 | shuttle-3 Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa | 12.1% |
| #31 | T3Q-qwen2.5-14b-v1.0-e3 Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa | 12.0% |
Head-to-Head: #1 vs #2
#1
Top Pickqwen-2.5-72b-instruct
Strong on Open LLM Leaderboard IFEval ifeval and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct
Conf 28.7%
#2
gpt-4.1-20250414
Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct
Conf 29.7%
Related Lookups
Best LLM for Code Generation
Benchmark-backed ranking of models for generating correct, secure code from requirements.
Best LLM for Debugging
Find the top-ranked models for localizing bugs and proposing fixes with explanations.
Best LLM for Unit Test Generation
Ranked models for generating meaningful unit tests and edge cases from code.
Best LLM for Code Review
Compare models for automated PR review covering correctness, security, and maintainability.
Best LLM for Autonomous Coding
Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.
Best LLM for Function Calling
Compare models for reliable tool use, function selection, and multi-step API orchestration.