LLM Use Cases — Find the Right Model for Any Task

Use Case	Vertical	Best Score	Top Model	Confidence
Thesis red teaming Stress-test an investment thesis with counterarguments and risk.	Finance	48.9%	gemini-3.1-pro-preview	90%
Earnings call synthesis Summarize earnings calls into key points, tone, and risks.	Finance	44.1%	gemini-3.1-pro-preview	89%
Transaction anomaly narrative Summarize anomalies into hypotheses, evidence, and follow-up actions.	Finance	43.3%	gemini-3.1-pro-preview	89%
Casual chat companion Engaging conversation with consistent tone and context.	Companion	50.1%	gemini-3-pro-preview	89%
Life coaching and goal planning Goal setting, habit planning, and accountability check-ins.	Companion	50.1%	gemini-3-pro-preview	89%
Tarot-style reading Symbolic, personalized readings with consistent persona.	Companion	50.1%	gemini-3-pro-preview	89%
Mindfulness and meditation scripts Generate calming scripts and exercises tailored to a user's context.	Companion	48.5%	gemini-3-pro-preview	88%
Empathetic support chat Supportive conversation with strong boundaries and safe escalation.	Companion	48.7%	gemini-3-pro-preview	88%
Accounts payable invoice extraction (text) Extract structured fields from invoices/receipts for AP workflows.	Finance	38.8%	gemini-3.1-pro-preview	88%
AML alert triage Triage AML alerts into severity, rationale, and next actions.	Finance	41.5%	gemini-3.1-pro-preview	88%
KYC profile synthesis Turn identity docs and notes into a structured KYC profile.	Finance	41.5%	gemini-3.1-pro-preview	88%
Filings summarization (10-K/10-Q) Summarize filings with conservative factuality and risk highlights.	Finance	38.5%	gemini-3.1-pro-preview	87%
Adult erotica (long-form, explicit) Long-form explicit erotica with controllable style and strict boundaries.	Adult	43.6%	gemini-3-pro-preview	87%
Component selection assistant Recommend components under constraints with evidence and tradeoffs.	Engineering	35.5%	gemini-3.1-pro-preview	85%
Quant research code generation Generate backtest or analysis code from trading hypotheses.	Finance	34.2%	gpt-5-2025-08-07	85%
Interactive fiction / DM Run interactive fiction with state tracking and user agency.	Creative	44.2%	gemini-3-pro-preview	85%
NPC dialogue Low-latency in-character dialogue suitable for games.	Creative	44.2%	gemini-3-pro-preview	85%
SFW roleplay and simulation Roleplay/simulations for learning or entertainment with state tracking.	Creative	45.9%	gemini-3-pro-preview	85%
Adult ERP roleplay (explicit) Explicit adult roleplay with boundary adherence and persona memory.	Adult	49.5%	Grok-4-0709	85%
Cross-paper contradiction analysis Identify contradictions and uncertainty across papers with citations.	Biomedical	Provisional	No stable winner	85%
Literature synthesis with citations Synthesize papers and guidelines with citations and uncertainty.	Biomedical	Provisional	No stable winner	85%
Knowledge base Q&A (fast, no citations) Answer KB questions grounded in retrieved text without citations.	Productivity	Provisional	No stable winner	84%
Runbook step assistant Suggest safe runbook steps and escalation points grounded in docs.	DevOps	Provisional	No stable winner	84%
Contract Drafting & Redlining Drafting, reviewing, and suggesting edits to legal contracts and agreements.	Legal	Provisional	No stable winner	84%
Litigation risk memo Summarize a claim into litigation risk drivers and mitigation steps.	Insurance	Provisional	No stable winner	84%
Simulation setup assistant Turn design requirements into simulation setup checklists and boundary notes.	Engineering	Provisional	No stable winner	84%
Log triage Interpret logs and propose safe diagnostic steps.	DevOps	Provisional	No stable winner	84%
Contract Q&A (RAG grounded) Answer contract questions grounded in the actual contract text.	Legal	Provisional	No stable winner	84%
Knowledge base Q&A (with citations) Answer questions grounded in an internal KB, with evidence.	Productivity	Provisional	No stable winner	84%
Regulatory summary Summarize and compare regulatory text with conservative interpretation.	Legal	Provisional	No stable winner	84%
HR policy Q&A Answer HR policy questions grounded in authoritative text.	HR	Provisional	No stable winner	84%
Disinformation and manipulation resistance (eval) Measure refusal and safe handling of deceptive content generation requests.	Risk & Eval	Provisional	No stable winner	83%
Document summarization Summarize long business documents into scannable outputs.	Productivity	Provisional	No stable winner	83%
Political risk brief Summarize key developments into risks, scenarios, and actions.	Geopolitics	Provisional	No stable winner	83%
Agent-assist reply suggestions Draft replies for human agents with tone and policy constraints.	CX	Provisional	No stable winner	83%
Decision memo Recommend a decision with options, constraints, and risks.	Productivity	Provisional	No stable winner	83%
Executive briefing Turn raw notes into a short executive brief with risks and actions.	Productivity	Provisional	No stable winner	83%
Search query rewriting Rewrite queries into higher-recall search queries and filters.	Productivity	Provisional	No stable winner	83%
Contract redline summary Summarize material changes between contract versions with clause refs.	Legal	Provisional	No stable winner	83%
Support dialogue agent Multi-turn support conversations with escalation and policy awareness.	CX	Provisional	No stable winner	83%
Clause playbook check Check extracted terms against a playbook and flag deviations.	Legal	Provisional	No stable winner	83%
Contract term extraction Extract key terms into structured fields with clause references.	Legal	Provisional	No stable winner	83%
Support bot (RAG grounded) Support chatbot grounded in docs with optional citations and escalation.	CX	Provisional	No stable winner	83%
SQL debugging Diagnose and fix SQL queries for correctness and performance.	Data	Provisional	No stable winner	83%
Policy wording comparison Compare policy wording against a standard and flag material differences.	Insurance	Provisional	No stable winner	83%
Operator support chat Real-time operator assistant with grounded troubleshooting and escalation.	Industrial	Provisional	No stable winner	83%
Maintenance RCA memo Turn logs and notes into a maintenance root cause analysis.	Industrial	Provisional	No stable winner	83%
Manuals Q&A (RAG grounded) Answer operator questions grounded in technical manuals and runbooks.	Industrial	Provisional	No stable winner	83%
Social listening brief Summarize social chatter into themes, risks, and recommendations.	Marketing	Provisional	No stable winner	83%
Codebase onboarding brief Summarize a repository's architecture, modules, and conventions.	Developer	Provisional	No stable winner	83%
Patient education bot (RAG grounded) Answer patient FAQ using trusted sources with cautious wording.	Healthcare	Provisional	No stable winner	83%
Disruption monitoring brief Summarize disruptions into risk, options, and recommendations.	Supply Chain	Provisional	No stable winner	82%
Supplier risk monitoring Track supplier risk signals from multi-source text and summarize actions.	Supply Chain	Provisional	No stable winner	82%
Narrative tracking Track narratives across multi-lingual sources and flag contradictions.	Geopolitics	Provisional	No stable winner	82%
Campaign brief Draft a campaign brief with positioning, audience, and channels.	Marketing	Provisional	No stable winner	82%
Product positioning and messaging Develop positioning, value props, and message pillars with tradeoffs.	Marketing	Provisional	No stable winner	82%
Social post generation Generate short channel-specific social posts and variations.	Marketing	Provisional	No stable winner	82%
Landing page copy Draft landing pages with clear positioning and structure.	Marketing	Provisional	No stable winner	82%
Autonomous Coding Agent End-to-end autonomous software engineering: reading issues, writing code, running tests, submitting PRs.	Developer	Provisional	No stable winner	82%
Language conversation partner Conversational practice with gentle corrections and explanations.	Education	Provisional	No stable winner	82%
Medical coding support (suggestions) Extract coding-relevant facts and suggest codes for human review.	Healthcare	Provisional	No stable winner	82%
Poetry and lyrics Generate poems and lyrics with style control and variation.	Creative	Provisional	No stable winner	82%
Screenplay scene writing Write screenplay scenes with formatting, pacing, and strong dialogue.	Creative	Provisional	No stable winner	82%
Code generation Generate correct, secure code from requirements.	Developer	Provisional	No stable winner	82%
CAD scripting helper Generate and debug CAD automation scripts and parametric geometry code.	Engineering	Provisional	No stable winner	82%
PR crisis response draft Draft a conservative public statement and internal talking points.	Marketing	Provisional	No stable winner	82%
Customer feedback theme mining Extract themes and trends from reviews, tickets, and surveys.	CX	Provisional	No stable winner	82%
Title document search assistant (RAG grounded) Navigate and answer questions across a corpus of property documents.	Real Estate	Provisional	No stable winner	82%
Config debugging Diagnose and patch YAML/JSON/TOML configs with minimal diffs.	DevOps	Provisional	No stable winner	82%
Kubernetes manifest generation Generate K8s manifests with safe defaults and probes.	DevOps	Provisional	No stable winner	82%
Terraform generation Generate Terraform IaC with correct resources and safe defaults.	DevOps	Provisional	No stable winner	82%
Metric definition workshop Turn ambiguous KPI definitions into precise, measurable specs.	Data	Provisional	No stable winner	82%
Archaic and historical translation Translate older or domain-specific language into modern equivalents.	History	Provisional	No stable winner	82%
Refactoring assistant Refactor code safely while preserving behavior and improving clarity.	Developer	Provisional	No stable winner	82%
Ad copy variants Generate diverse headline/CTA variants under strict constraints.	Marketing	Provisional	No stable winner	82%
Personalized sales outreach Draft outbound emails/DMs personalized to a prospect persona.	Marketing	Provisional	No stable winner	82%
Dashboard narratives Generate weekly KPI narratives and investigation suggestions.	Data	Provisional	No stable winner	82%
Grammar and writing coach Correct grammar and explain fixes at the learner's level.	Education	Provisional	No stable winner	82%
Security incident triage Triage security incidents from alerts/logs into impact and next steps.	Security	Provisional	No stable winner	82%
Support FAQ bot Answer common support questions with safe troubleshooting steps.	CX	Provisional	No stable winner	82%
IDE code completion Fast local-context code completion and small snippet generation.	Developer	Provisional	No stable winner	82%
Vendor contract summary (procurement) Summarize vendor contracts into key terms, risks, and deviations.	Supply Chain	Provisional	No stable winner	82%
Long-form story co-author Generate and refine long-form fiction with continuity.	Creative	Provisional	No stable winner	82%
Verilog/VHDL generation Generate RTL code and testbenches from functional specs.	Engineering	Provisional	No stable winner	81%
Fraud signal summary Summarize potential fraud indicators with conservative evidence framing.	Insurance	Provisional	No stable winner	81%
Malware analysis report (defensive) Explain suspicious code and produce a defensive analysis report.	Security	Provisional	No stable winner	81%
PR review agent Review diffs for correctness, security, and maintainability.	Developer	Provisional	No stable winner	81%
Crisis escalation protocol (eval) Measure safe crisis escalation behavior under the selected policy.	Risk & Eval	Provisional	No stable winner	81%
Jailbreak resistance (eval) Measure robustness to adversarial prompts that attempt to bypass policy.	Risk & Eval	Provisional	No stable winner	81%
Overrefusal (eval) Measure how often benign requests are incorrectly refused.	Risk & Eval	Provisional	No stable winner	81%
Refusal profile (eval) Measure refusal/overrefusal rates across predefined categories.	Risk & Eval	Provisional	No stable winner	81%
Scam and social engineering resistance (eval) Measure refusal and safe handling of deception/scam requests.	Risk & Eval	Provisional	No stable winner	81%
Agentic bug fixing Agentic loop that reproduces, fixes, and validates with tests.	Developer	Provisional	No stable winner	81%
Debugging assistant Localize bugs and propose fixes with explanations.	Developer	Provisional	No stable winner	81%
Cross-lingual summary Summarize a document in one language into another language.	Productivity	Provisional	No stable winner	81%
Prompt injection resistance (eval) Measure resistance to prompt injection in RAG and tool settings.	Risk & Eval	Provisional	No stable winner	81%
Spam filtering and classification Detect spam and low-quality messages for routing and moderation.	CX	Provisional	No stable winner	81%
Toxicity moderation routing Classify abusive content for moderation and escalation.	CX	Provisional	No stable winner	81%
Legal translation Translate legal text with terminology consistency and format safety.	Legal	Provisional	No stable winner	81%
Agentic incident response Agentic tool-using workflow for incident triage and remediation planning.	DevOps	Provisional	No stable winner	81%