BasedAGIBasedAGI
Menu
Rankings live

data_analytics

Insight mining from text corpora

Extract themes and actions from large text datasets.

#1 Recommendation

qwen-2.5-72b-instruct

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (83%) and JSONSchemaBench Leaderboard medium_schema_compliance_pct (90%)

external/qwen/qwen-2-5-72b-instruct

21.9%

Score

31.9%

Confidence

Limited benchmark evidence for this use case.

35 ranked models with average evidence of 9.5 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

DuckDB NSQL Leaderboard: all_execution_accuracy

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1qwen-2.5-72b-instruct

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (83%) and JSONSchemaBench Leaderboard medium_schema_compliance_pct (90%)

21.9%
#2gpt-4o

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (77%) and JSONSchemaBench Leaderboard medium_schema_compliance_pct (100%)

20.6%
#3gpt-4o-20241120

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (96%) and DuckDB NSQL Leaderboard hard_execution_accuracy (75%)

18.5%
#5deepseek/deepseek-r1
16.3%
#10openai/gpt-4o-mini-2024-07-18
13.2%
#12gpt-4o-2024-08-06
12.6%
#15gpt-4.1-20250414
11.4%
#16gemini-3-pro-preview
11.1%
#22gemini-2.5-pro
10.5%
#23google/gemini-2.0-flash-001
10.4%
#25Llama-3.3-70B-Instruct
10.1%
#33Grok-4-0709
9.3%
#35gemma-2-27b-it
9.1%
#36google/gemini-3.1-pro-preview
8.9%
#37claude-sonnet-4-20250514
8.9%
#38phi-4
8.8%
#39Qwen3-30B-A3B
8.7%
#41Qwen2.5-Coder-7B
8.5%
#42gemini-2.5-flash
8.5%
#45Qwen3-32B
8.1%
#49QwQ-32B-Preview
7.6%
#50gpt-5-mini-2025-08-07
7.5%
#54Phi-3-medium-128k-instruct
7.2%
#56Llama-3.1-70B-Instruct
6.9%
#59gpt-4o-2024-05-13
6.4%
#62Meta-Llama-3-8B-Instruct
6.2%
#63Meta-Llama-3.1-8B
6.1%
#64Phi-3-mini-128k-instruct
6.1%
#72minimax/minimax-m2.1
5.4%
#73deepseek-v3
5.2%

Compare Models

Model A leads by +1.4%

Shareable Link →

Model A

qwen-2.5-72b-instruct

external/qwen/qwen-2-5-72b-instruct

21.9%

Rank #1

Confidence 31.9%12 evidence pts

DuckDB NSQL Leaderboard: all_execution_accuracy

Value 82.7% · Conf 100.0% · Weight 5.9%

duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)

JSONSchemaBench Leaderboard: medium_schema_compliance_pct

Value 90.1% · Conf 100.0% · Weight 3.4%

jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)

JSONSchemaBench Leaderboard: hard_schema_compliance_pct

Value 74.4% · Conf 100.0% · Weight 1.9%

jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 76.1% · Conf 100.0% · Weight 1.2%

galileo_agent_v2.avg_ac (Mar 12, 2026)

Model B

gpt-4o

external/openai/gpt-4o

20.6%

Rank #2

Confidence 32.1%14 evidence pts

DuckDB NSQL Leaderboard: all_execution_accuracy

Value 76.9% · Conf 100.0% · Weight 5.5%

duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)

JSONSchemaBench Leaderboard: medium_schema_compliance_pct

Value 100.0% · Conf 100.0% · Weight 3.7%

jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)

JSONSchemaBench Leaderboard: hard_schema_compliance_pct

Value 100.0% · Conf 100.0% · Weight 2.5%

jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)

DuckDB NSQL Leaderboard: hard_execution_accuracy

Value 50.0% · Conf 100.0% · Weight 1.1%

duckdb_nsql_leaderboard.hard_execution_accuracy (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

35

Sources

8

Quality

Insufficient

DuckDB NSQL Leaderboard

duckdb_nsql_leaderboard

23 rows

2.6% avg lift

Vals Legal Bench

vals_legal_bench

12 rows

0.4% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

11 rows

0.4% avg lift

Vals CorpFin v2

vals_corp_fin_v2

11 rows

0.3% avg lift

Missing Strong Models

anthropic/claude-sonnet-4.6

external/anthropic/claude-sonnet-4-6

Rank #4

21.1%

Thin evidence after weighting

gpt-5-2025-08-07

external/openai/gpt-5-2025-08-07

Rank #9

19.2%

Thin evidence after weighting

openai/gpt-5.4-2026-03-05

external/openai/gpt-5-4-2026-03-05

Rank #10

18.9%

Thin evidence after weighting

claude-opus-4-5-20251101

external/anthropic/claude-opus-4-5-20251101

Rank #13

17.0%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.insight_mining

Required Modes

mode.long_context

Domains

domain.data_analytics_bi

Related Use Cases