data_analytics

Best Text-to-SQL Model

Ranked text-to-SQL models for converting natural language questions into accurate SQL queries.

Full Analysis Available

Benchmark methodology, patterns in the data, and deployment notes

#1 Recommendation

gpt-5-2025-08-07

Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct

external/openai/gpt-5-2025-08-07

21.5%

Score

29.3%

Confidence

Evidence

Runners-up:#2 gpt-4o (20.6%)#3 qwen-2.5-72b-instruct (20.3%)#4 o3-20250416 (19.0%)

Ranked Models

Evidence Quality

93%

Evidence Points

Top Signal

Spider2.0 Snow Text-to-SQL: snow_text_to_sql_score_pct

Benchmark Sources

Last Updated

21h ago

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gpt-5-2025-08-07 Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct	21.5%	29%	—	Spider2.0 Snow Text-to-SQL·Apr 29, 2026LiveSQLBench·Apr 29, 2026
🥈	gpt-4o Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct	20.6%	38%	$0.26	DuckDB NSQL Leaderboard·Apr 29, 2026JSONSchemaBench Leaderboard·Apr 29, 2026
🥉	qwen-2.5-72b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct	20.3%	32%	—	DuckDB NSQL Leaderboard·Apr 29, 2026JSONSchemaBench Leaderboard·Apr 29, 2026
#4	o3-20250416 Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct	19.0%	30%	$3.50	Spider2.0 Snow Text-to-SQL·Apr 29, 2026LiveSQLBench·Apr 29, 2026
#5	deepseek-r1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy	18.1%	32%	$0.88	DuckDB NSQL Leaderboard·Apr 29, 2026DuckDB NSQL Leaderboard·Apr 29, 2026
#6	claude-sonnet-4 Strong on LiveSQLBench success_rate_pct and Spider2.0 Lite Text-to-SQL lite_text_to_sql_score_pct	17.2%	32%	$6.00	LiveSQLBench·Apr 29, 2026Spider2.0 Lite Text-to-SQL·Apr 29, 2026
#9	gemini-3.1-pro-preview Strong on FACTS Benchmark Suite facts_search_score_pct and FACTS Benchmark Suite facts_grounding_score_pct	16.2%	19%	$4.50	FACTS Benchmark Suite·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#11	gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	15.0%	23%	$3.44	FACTS Benchmark Suite·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#13	Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Berkeley Function Calling Leaderboard (Overall) Overall Acc	14.5%	20%	—	Vals CorpFin v2·Apr 29, 2026Berkeley Function Calling Leaderboard (Overall)·Apr 29, 2026
#14	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct	14.4%	18%	—	FACTS Benchmark Suite·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#15	qwen-2.5-coder7b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct	14.1%	25%	—	DuckDB NSQL Leaderboard·Apr 29, 2026JSONSchemaBench Leaderboard·Apr 29, 2026
#16	phi-4 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard GPQA gpqa	14.1%	24%	—	DuckDB NSQL Leaderboard·Apr 29, 2026Open LLM Leaderboard GPQA·Apr 29, 2026
#17	gpt-5-mini-2025-08-07 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	14.1%	22%	—	Vals Finance Agent·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#18	gemini-3-pro-preview Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and Vals CorpFin v2 overall_accuracy_pct	14.1%	19%	$4.50	Berkeley Function Calling Leaderboard (Overall)·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#19	gemini-3-flash-preview Strong on Vals CorpFin v2 overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct	14.0%	19%	$1.13	Vals CorpFin v2·Apr 29, 2026FACTS Benchmark Suite·Apr 29, 2026
#20	gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct	12.9%	19%	—	Vectara HHEM Leaderboard·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#21	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	12.5%	16%	$6.00	Vals Finance Agent·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#22	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	12.5%	18%	$0.56	FACTS Benchmark Suite·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#23	claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct	12.2%	18%	—	FACTS Benchmark Suite·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#24	gpt-4o-2024-08-06 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Vectara HHEM Leaderboard overall_hallucination_error_pct	12.1%	24%	—	DuckDB NSQL Leaderboard·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#25	Llama-3.3-70B-Instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct	12.0%	20%	—	DuckDB NSQL Leaderboard·Apr 29, 2026Open LLM Leaderboard MMLU-Pro·Apr 29, 2026
#26	o4-mini Strong on LiveSQLBench success_rate_pct and Vals CorpFin v2 overall_accuracy_pct	11.5%	21%	$1.93	LiveSQLBench·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#27	grok-4-1-fast-reasoning Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and Vals CorpFin v2 overall_accuracy_pct	11.5%	16%	$0.28	Berkeley Function Calling Leaderboard (Overall)·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#28	gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	11.4%	17%	$0.17	FACTS Benchmark Suite·Apr 29, 2026Vectara HHEM Leaderboard·Apr 29, 2026
#30	gemma-2-27b-it Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard GPQA gpqa	11.3%	21%	—	DuckDB NSQL Leaderboard·Apr 29, 2026Open LLM Leaderboard GPQA·Apr 29, 2026
#32	gpt-4o-mini-2024-07-18 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy	11.2%	19%	—	DuckDB NSQL Leaderboard·Apr 29, 2026DuckDB NSQL Leaderboard·Apr 29, 2026
#33	qwen-2.5-coder32b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct	11.2%	33%	—	DuckDB NSQL Leaderboard·Apr 29, 2026Open LLM Leaderboard MMLU-Pro·Apr 29, 2026
#34	gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct	11.1%	13%	—	Vectara HHEM Leaderboard·Apr 29, 2026Vals CorpFin v2·Apr 29, 2026
#39	Phi-3-medium-128k-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy	10.3%	20%	—	DuckDB NSQL Leaderboard·Apr 29, 2026DuckDB NSQL Leaderboard·Apr 29, 2026
#41	grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	10.2%	18%	$0.28	Vals CorpFin v2·Apr 29, 2026Vals Finance Agent·Apr 29, 2026

Head-to-Head: #1 vs #2

Top Pick

gpt-5-2025-08-07

Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct

21.5%

Conf 29.3%

gpt-4o

Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct

20.6%

Conf 38.0%

Full Comparison with Benchmark Evidence →

Full Use-Case Page Browse All Use Cases How We Score

Related Lookups

Best LLM for Code Generation

Benchmark-backed ranking of models for generating correct, secure code from requirements.

Best LLM for Debugging

Find the top-ranked models for localizing bugs and proposing fixes with explanations.

Best LLM for Unit Test Generation

Ranked models for generating meaningful unit tests and edge cases from code.

Best LLM for Code Review

Compare models for automated PR review covering correctness, security, and maintainability.

Best LLM for Autonomous Coding

Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.

Best LLM for Function Calling

Compare models for reliable tool use, function selection, and multi-step API orchestration.