benchmark evidence

BFCL Multi-Turn

BFCL multi-turn function calling accuracy.

winner on BFCL Multi-Turn

direct benchmark result, not a broad vertical composite | source row dated 2026-05-15

scored on 2026-05-15 · stale source data (60d)

latest mapped results | top 20

#	Model	Score	Evidence	Tested
1	MoonshotAI: Kimi K2 Thinking Moonshotai	50.6	model-only independent_benchmark	2026-05-15
2	OpenAI: GPT-5.2 Openai	43.8	model-only independent_benchmark	2026-05-15
3	DeepSeek: DeepSeek V3.2 Deepseek	37.4	model-only independent_benchmark	2026-05-15
4	Meta: Llama 3.3 70B Instruct Meta Llama	21.5	model-only independent_benchmark	2026-05-15
5	Meta: Llama 4 Maverick Meta Llama	20.3	model-only independent_benchmark	2026-05-15
6	Google: Gemini 2.5 Flash Google	16.8	model-only independent_benchmark	2026-05-15
7	OpenAI: o4 Mini Openai	16.6	model-only independent_benchmark	2026-05-15
8	Anthropic: Claude Opus 4.5 Anthropic	16.1	model-only independent_benchmark	2026-05-15
9	Mistral: Mistral Small 4 Mistralai	14.8	model-only independent_benchmark	2026-05-15
10	OpenAI: o3 Openai	14.8	model-only independent_benchmark	2026-05-15
11	Mistral Large Mistralai	13.8	model-only independent_benchmark	2026-05-15
12	Google: Gemma 3 27B Google	10.8	model-only independent_benchmark	2026-05-15
13	OpenAI: GPT-4.1 Openai	9.8	model-only independent_benchmark	2026-05-15
14	Meta: Llama 4 Scout Meta Llama	9.0	model-only independent_benchmark	2026-05-15
15	Amazon: Nova Pro 1.0 Amazon	1.9	model-only independent_benchmark	2026-05-15
16	Anthropic: Claude Haiku 4.5 Anthropic	1.8	model-only independent_benchmark	2026-05-15
17	Anthropic: Claude Sonnet 4.5 Anthropic	1.6	model-only independent_benchmark	2026-05-15

what this result means

BFCL multi-turn function calling accuracy.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on BFCL Multi-Turn. Broad task pages require independent corroboration before naming a general winner.

source record

category: structured_output

metric: accuracy

matched models: 17

latest source date: 2026-05-15

direction: higher is better