BasedAGIBasedAGI
Menu
Rankings live

Use-case routing, backed by benchmarks

Every ranking row is evidence-backed. We score models per use case using external benchmark metrics, then expose confidence and top contributors so you can verify why a model ranks where it does.

Featured Use Cases

15 curated

finance

Earnings call synthesis

Summarize earnings calls into key points, tone, and risks.

2 core tasks
gemini-3-pro-preview
40.9%
Evidencesufficient 89%

devops_sre

Log triage

Interpret logs and propose safe diagnostic steps.

2 core tasks
gemini-3-pro-preview
40.4%
Evidenceinsufficient 84%

business_productivity

Document summarization

Summarize long business documents into scannable outputs.

1 core tasks
gemini-3-pro-preview
37.6%
Evidenceinsufficient 83%

business_productivity

Knowledge base Q&A (with citations)

Answer questions grounded in an internal KB, with evidence.

2 core tasks
gemini-3-pro-preview
40.8%
Evidenceinsufficient 83%

legal

Contract term extraction

Extract key terms into structured fields with clause references.

1 core tasks
gemini-2.5-pro
31.7%
Evidenceinsufficient 82%

customer_experience

Support bot (RAG grounded)

Support chatbot grounded in docs with optional citations and escalation.

2 core tasks
gemini-3-pro-preview
35.6%
Evidenceinsufficient 82%

risk_eval

Prompt injection resistance (eval)

Measure resistance to prompt injection in RAG and tool settings.

1 core tasks
gemini-2.5-pro
25.6%
Evidenceinsufficient 82%

developer_tools

Code generation

Generate correct, secure code from requirements.

2 core tasks
anthropic/claude-sonnet-4.6
19.7%
Evidenceinsufficient 81%

developer_tools

Debugging assistant

Localize bugs and propose fixes with explanations.

2 core tasks
gpt-4o-2024-05-13
20.5%
Evidenceinsufficient 80%

marketing_sales

Ad copy variants

Generate diverse headline/CTA variants under strict constraints.

2 core tasks
gpt-4o
26.8%
Evidenceinsufficient 80%

data_analytics

Text-to-SQL analyst assistant

Convert questions into SQL and explain the query.

2 core tasks
gemini-3-pro-preview
19.6%
Evidenceinsufficient 80%

healthcare

Clinical note drafting

Summarize encounters into structured notes for clinician review.

2 core tasks
gpt-4.1-20250414
19.6%
Evidenceinsufficient 80%

creative

NPC dialogue

Low-latency in-character dialogue suitable for games.

2 core tasks
qwen-2.5-72b-instruct
20.9%
Evidenceinsufficient 80%

adult

Adult ERP roleplay (explicit)

Explicit adult roleplay with boundary adherence and persona memory.

2 core tasks
Grok-4-0709
20.2%
Evidenceinsufficient 79%

creative

Long-form story co-author

Generate and refine long-form fiction with continuity.

2 core tasks
qwen-2.5-72b-instruct
23.3%
Evidenceinsufficient 79%

All Use Cases

100 indexed · 9 sufficient · 91 insufficient · 0 unscored
Use CaseTop Score
Thesis red teaming

use_case.fin.thesis_red_team

46.3%
Earnings call synthesis

use_case.fin.earnings_call_synthesis

40.9%
Transaction anomaly narrative

use_case.fin.transaction_anomaly_narrative

40.8%
AML alert triage

use_case.fin.aml_alert_triage

39.1%
KYC profile synthesis

use_case.fin.kyc_profile_synthesis

39.1%
Accounts payable invoice extraction (text)

use_case.fin.ap_invoice_extraction

35.1%
Filings summarization (10-K/10-Q)

use_case.fin.filings_summarization

35.6%
Component selection assistant

use_case.eng.component_selection

36.5%
Quant research code generation

use_case.fin.alpha_research_codegen

32.9%
Simulation setup assistant

use_case.eng.simulation_setup_assistant

26.7%
Cross-paper contradiction analysis

use_case.bio.paper_contradictions

33.2%
Literature synthesis with citations

use_case.bio.literature_synthesis

33.2%
Runbook step assistant

use_case.sre.runbook_steps

33.0%
Log triage

use_case.sre.log_triage

40.4%
Disinformation and manipulation resistance (eval)

use_case.security.disinformation_resistance_eval

32.2%
Litigation risk memo

use_case.ins.litigation_risk_memo

30.5%
Contract Q&A (RAG grounded)

use_case.legal.contract_qna

31.5%
Knowledge base Q&A (fast, no citations)

use_case.business.kb_qna_fast

35.2%
Regulatory summary

use_case.legal.regulatory_summary

30.9%
Document summarization

use_case.business.doc_summarization

37.6%
Political risk brief

use_case.geo.political_risk_brief

30.4%
Decision memo

use_case.business.decision_memo

28.0%
Executive briefing

use_case.business.exec_briefing

28.0%
Search query rewriting

use_case.business.search_query_rewrite

28.0%
Contract redline summary

use_case.legal.contract_redline_summary

32.7%
Knowledge base Q&A (with citations)

use_case.business.kb_qna_with_citations

40.8%
HR policy Q&A

use_case.hr.hr_policy_qna

30.0%
Clause playbook check

use_case.legal.playbook_clause_check

31.7%
Contract term extraction

use_case.legal.contract_term_extraction

31.7%
Agent-assist reply suggestions

use_case.cx.agent_assist_replies

30.5%
Policy wording comparison

use_case.ins.policy_wording_compare

30.9%
Codebase onboarding brief

use_case.dev.codebase_onboarding

26.2%
Patient education bot (RAG grounded)

use_case.health.patient_education_bot

26.5%
SQL debugging

use_case.data.sql_debugging

23.5%
Support dialogue agent

use_case.cx.support_dialogue_agent

30.9%
Social listening brief

use_case.mkt.social_listening_brief

27.1%
Crisis escalation protocol (eval)

use_case.safety.crisis_escalation_protocol

25.6%
Jailbreak resistance (eval)

use_case.security.jailbreak_resistance_eval

25.6%
Overrefusal (eval)

use_case.security.overrefusal_eval

25.6%
Refusal profile (eval)

use_case.security.refusal_profile_eval

25.6%
Scam and social engineering resistance (eval)

use_case.security.scam_social_engineering_resistance_eval

25.6%
Support bot (RAG grounded)

use_case.cx.support_rag_bot

35.6%
Prompt injection resistance (eval)

use_case.security.prompt_injection_resistance_eval

25.6%
Narrative tracking

use_case.geo.narrative_tracking

26.9%
Medical coding support (suggestions)

use_case.health.medical_coding_suggest

24.4%
PR review agent

use_case.dev.pr_review_agent

18.3%
Operator support chat

use_case.ops.operator_support_chat

29.3%
Code generation

use_case.dev.code_generation

19.7%
Agentic bug fixing

use_case.dev.agentic_bug_fixing

18.8%
Agentic incident response

use_case.sre.agentic_incident_response

23.4%
Title document search assistant (RAG grounded)

use_case.re.title_search_assistant

25.5%
Maintenance RCA memo

use_case.ops.maintenance_rca

27.2%
Manuals Q&A (RAG grounded)

use_case.ops.manuals_qna

27.2%
Support FAQ bot

use_case.cx.support_faq_bot

26.2%
Metric definition workshop

use_case.data.metric_definition_workshop

25.9%
Archaic and historical translation

use_case.history.archaic_translation

32.9%
Disruption monitoring brief

use_case.sc.disruption_monitoring_brief

27.1%
Supplier risk monitoring

use_case.sc.supplier_risk_monitoring

27.1%
Customer feedback theme mining

use_case.cx.feedback_theme_mining

27.6%
Security incident triage

use_case.cyber.incident_triage

28.5%
Campaign brief

use_case.mkt.campaign_brief

24.1%
Product positioning and messaging

use_case.mkt.product_positioning

24.1%
Social post generation

use_case.mkt.social_post_generation

24.1%
Malware analysis report (defensive)

use_case.cyber.malware_analysis_report

28.0%
Fraud signal summary

use_case.ins.fraud_signal_summary

25.7%
PR crisis response draft

use_case.mkt.pr_crisis_response

26.3%
Landing page copy

use_case.mkt.landing_page_copy

27.7%
Refactoring assistant

use_case.dev.refactoring

20.7%
Config debugging

use_case.sre.config_debugging

24.1%
Kubernetes manifest generation

use_case.sre.iac_k8s

24.1%
Terraform generation

use_case.sre.iac_terraform

24.1%
Dashboard narratives

use_case.data.dashboard_narratives

23.3%
Verilog/VHDL generation

use_case.eda.verilog_generation

20.7%
Cross-lingual summary

use_case.business.cross_lingual_summary

23.7%
Invoice and receipt extraction (text)

use_case.business.invoice_receipt_extraction

21.7%
Record dedupe and normalization

use_case.business.record_dedupe_normalization

21.7%
Grammar and writing coach

use_case.lang.grammar_coach

19.9%
Debugging assistant

use_case.dev.debugging

20.5%
Patient-friendly explanations

use_case.health.patient_friendly_summaries

28.1%
Document expansion

use_case.business.doc_expansion

24.0%
Ad copy variants

use_case.mkt.ad_copy_variants

26.8%
Personalized sales outreach

use_case.mkt.sales_outreach_personalized

26.8%
Brand voice localization

use_case.mkt.brand_voice_localization

23.8%
Casual chat companion

use_case.companion.casual_chat

22.4%
Empathetic support chat

use_case.companion.empathy_support_chat

22.4%
Life coaching and goal planning

use_case.companion.life_coaching

22.4%
Adult erotica (long-form, explicit)

use_case.adult.erotica_longform

17.4%
Safety and policy gating

use_case.cx.safety_gating

23.3%
Spam filtering and classification

use_case.cx.spam_filtering

23.3%
Toxicity moderation routing

use_case.cx.toxicity_moderation

23.3%
Text tagging and routing

use_case.business.text_tagging

23.1%
Legal translation

use_case.legal.legal_translation

29.8%
Vendor contract summary (procurement)

use_case.proc.vendor_contract_summary

27.2%
Mindfulness and meditation scripts

use_case.wellness.mindfulness_scripts

20.7%
Text-to-SQL analyst assistant

use_case.data.text_to_sql

19.6%
Historical document summarization

use_case.history.historical_doc_summarization

24.1%
Protocol structuring

use_case.bio.protocol_structuring

19.8%
Clinical note drafting

use_case.health.clinical_note_drafting

19.6%
Medical chart summary

use_case.health.medical_chart_summary

19.6%
Tarot-style reading

use_case.spiritual.tarot_reading

21.4%