benchmark evidence

BFCL Live

BFCL live function calling accuracy.

winner on BFCL Live

direct benchmark result, not a broad vertical composite | source row dated 2026-05-15

scored on 2026-05-15 · stale source data (60d)

latest mapped results | top 20

#	Model	Score	Evidence	Tested
1	Mistral: Mistral Small 4 Mistralai	79.0	model-only independent_benchmark	2026-05-15
2	OpenAI: GPT-4.1 Openai	78.9	model-only independent_benchmark	2026-05-15
3	MoonshotAI: Kimi K2 Thinking Moonshotai	78.7	model-only independent_benchmark	2026-05-15
4	Amazon: Nova Pro 1.0 Amazon	78.5	model-only independent_benchmark	2026-05-15
5	Google: Gemini 2.5 Flash Google	78.2	model-only independent_benchmark	2026-05-15
6	Meta: Llama 3.3 70B Instruct Meta Llama	76.6	model-only independent_benchmark	2026-05-15
7	Anthropic: Claude Opus 4.5 Anthropic	76.0	model-only independent_benchmark	2026-05-15
8	Meta: Llama 4 Scout Meta Llama	74.7	model-only independent_benchmark	2026-05-15
9	Google: Gemma 3 27B Google	74.5	model-only independent_benchmark	2026-05-15
10	Meta: Llama 4 Maverick Meta Llama	73.7	model-only independent_benchmark	2026-05-15
11	OpenAI: o4 Mini Openai	70.8	model-only independent_benchmark	2026-05-15
12	Mistral Large Mistralai	68.1	model-only independent_benchmark	2026-05-15
13	OpenAI: GPT-5.2 Openai	67.1	model-only independent_benchmark	2026-05-15
14	OpenAI: o3 Openai	66.2	model-only independent_benchmark	2026-05-15
15	DeepSeek: DeepSeek V3.2 Deepseek	53.7	model-only independent_benchmark	2026-05-15
16	Anthropic: Claude Haiku 4.5 Anthropic	52.5	model-only independent_benchmark	2026-05-15
17	Anthropic: Claude Sonnet 4.5 Anthropic	46.6	model-only independent_benchmark	2026-05-15

what this result means

BFCL live function calling accuracy.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on BFCL Live. Broad task pages require independent corroboration before naming a general winner.

source record

category: structured_output

metric: accuracy

matched models: 17

latest source date: 2026-05-15

direction: higher is better