BasedAGIBasedAGI
Menu
Rankings live

devops_sre

Best LLM for Incident Summary

Compare models for summarizing incidents into impact, timeline, and next actions.

#1 Recommendation

gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and MMLongBench-Doc Leaderboard acc_score_pct (75%)

external/openai/gpt-4-1-20250414

25.3%

Score

32.9%

Confidence

18

Evidence

Ranked Models

30

Evidence Quality

79%

Scoring

Benchmark-backed

Top Signal

Galileo Agent Leaderboard v2: Avg AC

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and MMLongBench-Doc Leaderboard acc_score_pct (75%)

25.3%
#2claude-sonnet-4-20250514

Strong on Galileo Agent Leaderboard v2 Avg AC (85%) and Galileo Agent Leaderboard v2 Avg TSQ (95%)

17.6%
#3qwen-2.5-72b-instruct

Strong on Galileo Agent Leaderboard v2 Avg AC (76%) and DuckDB NSQL Leaderboard all_execution_accuracy (83%)

17.4%
#4gemini-2.5-pro
16.8%
#5Grok-4-0709
16.7%
#6gpt-4o
16.4%
#7gemini-3-pro-preview
14.7%
#8kimi/kimi-k2.5-thinking
14.2%
#9gpt-4o-20241120
13.9%
#10gpt-4.1-mini-20250414
13.6%
#11google/gemini-3.1-pro-preview
13.3%
#12gemini-2.5-flash
12.9%
#13gpt-5-2025-08-07
12.3%
#14openai/gpt-5.4-2026-03-05
12.1%
#15gpt-5.1-2025-11-13
11.7%
#16anthropic/claude-sonnet-4.6
11.6%
#17claude-opus-4-5-20251101
11.5%
#18gpt-5-mini-2025-08-07
11.3%
#19anthropic/claude-opus-4-6-thinking
11.0%
#20gemini-3-flash-preview
11.0%
#21gpt-5.2-2025-12-11
10.8%
#22openai/gpt-4o-mini-2024-07-18
10.8%
#23anthropic/claude-opus-4-5-20251101-thinking
10.7%
#26anthropic/claude-sonnet-4-5-20250929-thinking
9.7%
#27xai-org/grok-4-fast-reasoning
9.6%
#29gpt-4o-2024-08-06
9.2%
#30o3-20250416
9.2%
#31google/gemini-3.1-flash-lite-preview
9.1%
#33xai-org/grok-4-1-fast-reasoning
9.1%
#35Kimi-K2-Instruct
8.5%

Head-to-Head: #1 vs #2

#1

Top Pick

gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and MMLongBench-Doc Leaderboard acc_score_pct (75%)

25.3%

Conf 32.9%

#2

claude-sonnet-4-20250514

Strong on Galileo Agent Leaderboard v2 Avg AC (85%) and Galileo Agent Leaderboard v2 Avg TSQ (95%)

17.6%

Conf 24.1%

Related Lookups