BasedAGIBasedAGI
Menu
Rankings live

devops_sre

Best LLM for Kubernetes

Compare models for generating K8s manifests with safe defaults and readiness probes.

#1 Recommendation

gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and Galileo Agent Leaderboard v2 Avg TSQ (64%)

external/openai/gpt-4-1-20250414

27.0%

Score

34.8%

Confidence

18

Evidence

Ranked Models

30

Evidence Quality

81%

Scoring

Benchmark-backed

Top Signal

Galileo Agent Leaderboard v2: Avg AC

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and Galileo Agent Leaderboard v2 Avg TSQ (64%)

27.0%
#2claude-sonnet-4-20250514

Strong on Galileo Agent Leaderboard v2 Avg AC (85%) and Galileo Agent Leaderboard v2 Avg TSQ (95%)

24.7%
#3gemini-2.5-pro

Strong on Galileo Agent Leaderboard v2 Avg AC (59%) and Galileo Agent Leaderboard v2 Avg TSQ (79%)

23.6%
#4Grok-4-0709
23.4%
#5gemini-3-pro-preview
20.6%
#6kimi/kimi-k2.5-thinking
20.0%
#7gpt-4.1-mini-20250414
19.1%
#8google/gemini-3.1-pro-preview
18.7%
#9gemini-2.5-flash
18.1%
#10gpt-5-2025-08-07
17.2%
#11openai/gpt-5.4-2026-03-05
16.9%
#12gpt-5.1-2025-11-13
16.5%
#13anthropic/claude-sonnet-4.6
16.3%
#14claude-opus-4-5-20251101
16.2%
#15gpt-5-mini-2025-08-07
15.9%
#16qwen-2.5-72b-instruct
15.6%
#17anthropic/claude-opus-4-6-thinking
15.5%
#18gemini-3-flash-preview
15.5%
#19gpt-5.2-2025-12-11
15.2%
#20anthropic/claude-opus-4-5-20251101-thinking
15.0%
#22anthropic/claude-sonnet-4-5-20250929-thinking
13.7%
#23xai-org/grok-4-fast-reasoning
13.5%
#24gpt-4o
13.3%
#26o3-20250416
12.9%
#27google/gemini-3.1-flash-lite-preview
12.9%
#29xai-org/grok-4-1-fast-reasoning
12.8%
#31openai/gpt-4o-mini-2024-07-18
12.5%
#32Kimi-K2-Instruct
12.0%
#34grok/grok-4.20-beta-0309-reasoning
11.7%
#36Llama-2-7b-chat-hf
11.5%

Head-to-Head: #1 vs #2

#1

Top Pick

gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and Galileo Agent Leaderboard v2 Avg TSQ (64%)

27.0%

Conf 34.8%

#2

claude-sonnet-4-20250514

Strong on Galileo Agent Leaderboard v2 Avg AC (85%) and Galileo Agent Leaderboard v2 Avg TSQ (95%)

24.7%

Conf 33.9%

Related Lookups