Model Rankings
Models ranked by composite utility score across all benchmark-backed use cases. The score weights each use-case result by its confidence level, so models with broader, higher-confidence coverage rank higher.
How the score works
Utility Score = sum of (use-case score × confidence) / sum of (confidence) across all scored use cases. This means a model that scores well on many use cases with high confidence ranks above one that scores well on few. Coverage breadth is used as a tiebreaker.
Recent Changes
0 use cases changed their #1 model since the last scoring update. Currently 19,299 model-use case pairings scored.
Showing top 100 models
| Rank | Model | Utility Score |
|---|---|---|
| #1 | external/google/gemini-3-pro-preview | 24.8% |
| #2 | external/google/gemini-2-5-pro | 23.6% |
| #3 | external/openai/gpt-4-1-20250414 | 21.4% |
| #4 | external/anthropic/claude-sonnet-4-6 | 20.5% |
| #5 | external/xai/grok-4-0709 | 20.2% |
| #6 | katanemo/Arch-Agent-32B | 19.0% |
| #7 | external/openai/gpt-5-mini-2025-08-07 | 18.9% |
| #8 | external/google/gemini-3-1-pro-preview | 18.5% |
| #9 | external/openai/gpt-5-2025-08-07 | 18.5% |
| #10 | external/openai/gpt-5-4-2026-03-05 | 18.3% |
| #11 | external/anthropic/claude-sonnet-4-20250514 | 17.3% |
| #12 | external/google/gemini-2-5-flash | 17.2% |
| #13 | external/openai/gpt-5-1-2025-11-13 | 16.4% |
| #14 | external/anthropic/claude-opus-4-5-20251101 | 16.4% |
| #15 | external/google/gemini-3-flash-preview | 15.7% |
| #16 | external/openai/gpt-5-2-2025-12-11 | 15.6% |
| #17 | external/anthropic/claude-opus-4-6-thinking | 15.5% |
| #18 | external/google/gemini-3-1-flash-lite-preview | 15.3% |
| #19 | external/xai-org/grok-4-fast-reasoning | 15.2% |
| #20 | tiiuae/falcon-7b-instruct | 15.1% |
| #21 | meta-llama/Llama-2-7b-chat-hf | 15.0% |
| #22 | external/anthropic/claude-opus-4-5-20251101-thinking | 14.7% |
| #23 | external/openai/gpt-4o | 14.6% |
| #24 | external/xai-org/grok-4-1-fast-reasoning | 14.5% |
| #25 | external/kimi/kimi-k2-5-thinking | 14.0% |
| #26 | katanemo/Arch-Agent-3B | 13.6% |
| #27 | external/anthropic/claude-sonnet-4-5-20250929-thinking | 13.6% |
| #28 | external/qwen/qwen-2-5-72b-instruct | 13.4% |
| #29 | katanemo/Arch-Agent-1.5B | 13.2% |
| #30 | external/openai/gpt-4-1-mini-20250414 | 12.7% |
| #31 | HuggingFaceH4/zephyr-7b-beta | 12.6% |
| #32 | CohereLabs/c4ai-command-r-plus | 12.3% |
| #33 | EasyDeL/Kimi-VL-A3B-Instruct | 12.2% |
| #34 | external/kimi/kimi-k2-thinking | 12.1% |
| #35 | Laibaaaaa/GLM-5 | 12.0% |
| #36 | AIDC-AI/Ovis1.6-Gemma2-9B | 12.0% |
| #37 | external/alibaba/qwen3-5-flash | 12.0% |
| #38 | google/gemma-2b-it | 12.0% |
| #39 | google/gemma-7b-it | 11.8% |
| #40 | google/gemma-2-27b-it | 11.8% |
| #41 | openai/gpt-oss-120b | 11.7% |
| #42 | external/x-ai/grok-3 | 11.7% |
| #43 | external/anthropic/claude-haiku-4-5-20251001-thinking | 11.7% |
| #44 | external/z-ai/glm-4-7 | 11.5% |
| #45 | external/minimax/minimax-m2-1 | 11.4% |
| #46 | external/mistralai/mistral-large-2512 | 11.3% |
| #47 | external/openai/o3-20250416 | 11.3% |
| #48 | external/openai/gpt-4o-2024-08-06 | 11.2% |
| #49 | grimjim/mistralai-Mistral-Nemo-Instruct-2407 | 11.2% |
| #50 | contextboxai/Qwen3-1.7B-FC | 11.1% |
| #51 | Yura37/11 | 11.0% |
| #52 | external/openai/gpt-5 | 11.0% |
| #53 | maicomputer/alpaca-native | 11.0% |
| #54 | openai/gpt-oss-20b | 10.9% |
| #55 | external/xai-org/grok-4-1-fast-non-reasoning | 10.7% |
| #56 | meta-llama/Llama-3.3-70B-Instruct | 10.6% |
| #57 | EasyDeL/GLM-4.6V | 10.5% |
| #58 | external/anthropic/claude-opus-4-1-20250805 | 10.5% |
| #59 | meta-llama/Meta-Llama-3-8B-Instruct | 10.4% |
| #60 | CometAPI/grok4 | 10.4% |
| #61 | external/openai/gpt-4o-20241120 | 10.4% |
| #62 | external/openai/gpt-4o-2024-05-13 | 10.3% |
| #63 | unsloth/Kimi-K2-Instruct | 10.2% |
| #64 | external/deepseek/deepseek-r1 | 10.1% |
| #65 | external/qwen/qwen3-max | 10.1% |
| #66 | RedHatAI/Mistral-Small-24B-Instruct-2501 | 10.0% |
| #67 | mcrovero/gemma-3-27b-it | 9.9% |
| #68 | Qwen/Qwen2.5-32B-Instruct | 9.9% |
| #69 | CohereLabs/c4ai-command-r-plus-08-2024 | 9.8% |
| #70 | Qwen/Qwen2.5-Coder-7B | 9.8% |
| #71 | meta-llama/Llama-3.1-70B-Instruct | 9.7% |
| #72 | anyidea/Qwen3-Embedding-8B | 9.6% |
| #73 | google/gemma-2-9b-it | 9.5% |
| #74 | Open-Orca/Mistral-7B-OpenOrca | 9.5% |
| #75 | Qwen/Qwen3-Embedding-4B | 9.5% |
| #76 | anas125244235/GLM-4.5-Air | 9.4% |
| #77 | meta-llama/Meta-Llama-3-70B-Instruct | 9.3% |
| #78 | Qwen/Qwen3-32B | 9.1% |
| #79 | Mira190/Euler-Legal-Embedding-V1 | 9.1% |
| #80 | deepseek-ai/DeepSeek-V2.5 | 8.9% |
| #81 | external/openai/gpt-4o-mini-2024-07-18 | 8.9% |
| #82 | mistralai/Mistral-7B-Instruct-v0.2 | 8.7% |
| #83 | Qwen/QwQ-32B-Preview | 8.6% |
| #84 | microsoft/Phi-3-medium-128k-instruct | 8.6% |
| #85 | 1kxia/Qwen3-Embedding-0.6B | 8.6% |
| #86 | yokoe/baseline | 8.5% |
| #87 | external/xai-org/grok-4-fast-non-reasoning | 8.5% |
| #88 | microsoft/phi-4 | 8.4% |
| #89 | raydel-0307/Qwen3-2B | 8.3% |
| #90 | CometAPI/o3-pro | 8.2% |
| #91 | external/openai/o4-mini-20250416 | 8.1% |
| #92 | Qwen/Qwen-VL-Chat | 8.0% |
| #93 | ICT-TIME-and-Querit/BOOM_4B_v1 | 8.0% |
| #94 | Alibaba-NLP/gte-Qwen2-1.5B-instruct | 8.0% |
| #95 | unsloth/Nemotron-3-Nano-30B-A3B | 7.9% |
| #96 | GritLM/GritLM-7B | 7.9% |
| #97 | Qwen/Qwen2.5-14B-Instruct | 7.9% |
| #98 | GritLM/GritLM-8x7B | 7.8% |
| #99 | residuals/gemma-3-12b | 7.7% |
| #100 | moonshotai/Kimi-K2-Instruct-0905 | 7.6% |