What model should
you use?
Benchmark-backed rankings for 151+ use cases across 5 intelligence dimensions, with confidence and evidence quality shown.
21/180
Benchmark sources
151
Use cases scored
Daily
Updates
5 Intelligence Dimensions
Every model is profiled across five axes — click any to see rankings
Reasoning & problem-solving ability
View rankings →
Emotional intelligence & social understanding
View rankings →
Factual reliability & hallucination resistance
View rankings →
Creative expression & generative quality
View rankings →
Safety alignment & refusal calibration
View rankings →
Browse Use Cases
Find ranked models for any workflow
Explore →Model Rankings
Trusted overall ranking by utility, confidence, and profile completeness
View Rankings →Top Models
Full Rankings →Homepage winners now use the stricter public ranking: at least 4 scored dimensions, 25% average confidence, and full-profile preference.
| Rank | Model | Score |
|---|---|---|
| 🥇 | gemini-2.5-pro Full profile | 25.4% |
| 🥈 | GLM-4.6 Full profile | 31.1% |
| 🥉 | gpt-5-2025-08-07 Full profile | 25.5% |
| 4 | Grok-4-0709 Full profile | 25.5% |
| 5 | anthropic/claude-sonnet-4 Full profile | 23.9% |
Popular Use Cases
Stable winners are shown only when evidence quality is strong.
finance
Earnings call synthesis
Summarize earnings calls into key points, tone, and risks.
creative
NPC dialogue
Low-latency in-character dialogue suitable for games.
adult
Adult ERP roleplay (explicit)
Explicit adult roleplay with boundary adherence and persona memory.
devops_sre
Log triage
Interpret logs and propose safe diagnostic steps.
business_productivity
Knowledge base Q&A (with citations)
Answer questions grounded in an internal KB, with evidence.
business_productivity
Document summarization
Summarize long business documents into scannable outputs.
Quick Lookups
59 indexedBest LLM for Code Generation
Benchmark-backed ranking of models for generating correct, secure code from requirements.
Best LLM for Debugging
Find the top-ranked models for localizing bugs and proposing fixes with explanations.
Best LLM for Unit Test Generation
Ranked models for generating meaningful unit tests and edge cases from code.
Best LLM for Code Review
Compare models for automated PR review covering correctness, security, and maintainability.
Best LLM for Autonomous Coding
Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.
Best LLM for Function Calling
Compare models for reliable tool use, function selection, and multi-step API orchestration.