BasedAGIBasedAGI
Menu
Rankings live

developer_tools

Best LLM for Function Calling

Compare models for reliable tool use, function selection, and multi-step API orchestration.

#1 Recommendation

anthropic/claude-sonnet-4.6

Strong on OpenHands Issue Resolution issue_resolution_score_pct (72%) and Vals SWE-bench overall_accuracy_pct (95%)

external/anthropic/claude-sonnet-4-6

16.5%

Score

29.6%

Confidence

26

Evidence

Ranked Models

25

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

OpenHands Issue Resolution: issue_resolution_score_pct

All Ranked Models

Max params:
Min confidence:
25 of 25
RankModelScore
#5anthropic/claude-sonnet-4.6
16.5%
#8kimi/kimi-k2.5-thinking
14.8%
#11GLM-5
13.8%
#12gpt-4o
13.4%
#13gemini-3-pro-preview
12.8%
#15Kimi K2 Thinking
12.6%
#16gpt-4.1-20250414
12.0%
#18gemini-2.5-pro
11.1%
#19Grok-4-0709
11.0%
#21claude-sonnet-4-20250514
11.0%
#22minimax/minimax-m2.1
10.9%
#23qwen-2.5-72b-instruct
10.6%
#24claude-opus-4-5-20251101
10.2%
#25gpt-5.2-2025-12-11
9.8%
#26gpt-4.1-mini-20250414
8.8%
#29gpt-4o-2024-08-06
8.7%
#30z-ai/glm-4.7
8.2%
#31gpt-5-2025-08-07
8.2%
#33deepseek/deepseek-r1
7.8%
#34gpt-4o-20241120
7.6%
#35o3-20250416
7.5%
#36gpt-4o-2024-05-13
6.9%
#37GLM-4.7
6.8%
#41GPT-4.1-nano-2025-04-14
4.0%
#42openai/gpt-4o-mini-2024-07-18
3.8%

Head-to-Head: #1 vs #2

#5

Top Pick

anthropic/claude-sonnet-4.6

Strong on OpenHands Issue Resolution issue_resolution_score_pct (72%) and Vals SWE-bench overall_accuracy_pct (95%)

16.5%

Conf 29.6%

#8

kimi/kimi-k2.5-thinking

Strong on Vals LiveCodeBench overall_accuracy_pct (94%) and Vals SWE-bench overall_accuracy_pct (83%)

14.8%

Conf 30.4%

Related Lookups