live
weekly refresh
basedagi.org
benchmark evidence

BFCL Live

BFCL live function calling accuracy.

winner on BFCL Live
direct benchmark result, not a broad vertical composite | source row dated 2026-05-15
scored on 2026-05-15
latest mapped results | top 20
#ModelScoreEvidenceTested
1Mistral: Mistral Small 4
Mistralai
79.0
model-only
independent_benchmark
2026-05-15
2OpenAI: GPT-4.1
Openai
78.9
model-only
independent_benchmark
2026-05-15
3MoonshotAI: Kimi K2 Thinking
Moonshotai
78.7
model-only
independent_benchmark
2026-05-15
4Amazon: Nova Pro 1.0
Amazon
78.5
model-only
independent_benchmark
2026-05-15
5Google: Gemini 2.5 Flash
Google
78.2
model-only
independent_benchmark
2026-05-15
6Meta: Llama 3.3 70B Instruct
Meta Llama
76.6
model-only
independent_benchmark
2026-05-15
7Anthropic: Claude Opus 4.5
Anthropic
76.0
model-only
independent_benchmark
2026-05-15
8Meta: Llama 4 Scout
Meta Llama
74.7
model-only
independent_benchmark
2026-05-15
9Google: Gemma 3 27B
Google
74.5
model-only
independent_benchmark
2026-05-15
10Meta: Llama 4 Maverick
Meta Llama
73.7
model-only
independent_benchmark
2026-05-15
11OpenAI: o4 Mini
Openai
70.8
model-only
independent_benchmark
2026-05-15
12Mistral Large
Mistralai
68.1
model-only
independent_benchmark
2026-05-15
13OpenAI: GPT-5.2
Openai
67.1
model-only
independent_benchmark
2026-05-15
14OpenAI: o3
Openai
66.2
model-only
independent_benchmark
2026-05-15
15DeepSeek: DeepSeek V3.2
Deepseek
53.7
model-only
independent_benchmark
2026-05-15
16Anthropic: Claude Haiku 4.5
Anthropic
52.5
model-only
independent_benchmark
2026-05-15
17Anthropic: Claude Sonnet 4.5
Anthropic
46.6
model-only
independent_benchmark
2026-05-15
what this result means

BFCL live function calling accuracy.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on BFCL Live. Broad task pages require independent corroboration before naming a general winner.

source record
category: structured_output
metric: accuracy
matched models: 17
latest source date: 2026-05-15
direction: higher is better
inspect upstream source ->