BasedAGIBasedAGI

devops_sre

Best LLM for Terraform

Ranked models for generating Terraform IaC with correct resources and safe defaults.

Full Analysis Available

Benchmark methodology, patterns in the data, and deployment notes

#1 Recommendation

anthropic/claude-sonnet-4

Strong on Galileo Agent Leaderboard v2 Avg AC and SWE-bench Verified Leaderboard swe_verified_resolved_pct

external/anthropic/claude-sonnet-4

25.7%

Score

35.0%

Confidence

23

Evidence

$6.00

per 1M tokens

Ranked Models

30

Evidence Quality

96%

Evidence Points

23

Top Signal

Galileo Agent Leaderboard v2: Avg AC

Benchmark Sources

33

Last Updated

19h ago

All Ranked Models

30 of 30 models
RankModelScore
🥇claude-sonnet-4

Strong on Galileo Agent Leaderboard v2 Avg AC and SWE-bench Verified Leaderboard swe_verified_resolved_pct

25.7%
🥈qwen-2.5-72b-instruct

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Galileo Agent Leaderboard v2 Avg AC

24.0%
🥉gemini-2.5-pro

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Galileo Agent Leaderboard v2 Avg AC

22.5%
#4gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC and SWE-bench Verified Leaderboard swe_verified_resolved_pct

22.2%
#5gemini-3-pro-preview

Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and SWE-bench Verified Leaderboard swe_verified_resolved_pct

21.1%
#6o3-20250416

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Berkeley Function Calling Leaderboard (Overall) Overall Acc

20.4%
#7gpt-5-2025-08-07

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Aider Polyglot Leaderboard percent_correct_pct

20.4%
#8Grok-4-0709

Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and Galileo Agent Leaderboard v2 Avg AC

20.4%
#9gpt-5.2-2025-12-11

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Berkeley Function Calling Leaderboard (Overall) Overall Acc

19.2%
#10Steelskull/L3.3-MS-Nevoria-70b

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

17.8%
#11MaziyarPanahi/calme-3.2-instruct-78b

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

17.7%
#12Steelskull/L3.3-Nevoria-R1-70b

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

17.5%
#13Mistral-Large-Instruct-2411

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

17.5%
#14claude-opus-4-5-20251101

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Berkeley Function Calling Leaderboard (Overall) Overall Acc

17.3%
#15MaziyarPanahi/calme-2.4-rys-78b

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

17.2%
#16MaziyarPanahi/calme-3.1-instruct-78b

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

17.2%
#17Tarek07/Progenitor-V1.1-LLaMa-70B

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

17.1%
#18CalmeRys-78B-Orpo-v0.1

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

17.1%
#19phi-4

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

16.7%
#20Apollo-70B

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

16.6%
#21Triangle104/Set-70b

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

16.6%
#22Sao10K/70B-L3.3-Cirrus-x1

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

16.6%
#23gpt-5-mini-2025-08-07

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals MedQA overall_accuracy_pct

16.6%
#24Homer-v1.0-Qwen2.5-72B

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

16.5%
#25Tarek07/Thalassic-Alpha-LLaMa-70B

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

16.5%
#26Sakalti/ultiima-72B-v1.5

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

16.2%
#27T3Q-qwen2.5-14b-v1.0-e3

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

16.1%
#28JungZoona/T3Q-Qwen2.5-14B-Instruct-1M-e3

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Open LLM Leaderboard GPQA gpqa

16.1%
#29gemini-2.5-flash

Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and Galileo Agent Leaderboard v2 Avg AC

16.1%
#30Llama3.3-70B-CogniLink

Strong on Open LLM Leaderboard GPQA gpqa and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct

16.1%

Head-to-Head: #1 vs #2

#1

Top Pick

anthropic/claude-sonnet-4

Strong on Galileo Agent Leaderboard v2 Avg AC and SWE-bench Verified Leaderboard swe_verified_resolved_pct

25.7%

Conf 35.0%

#2

qwen-2.5-72b-instruct

Strong on Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct and Galileo Agent Leaderboard v2 Avg AC

24.0%

Conf 35.6%

Related Lookups