BasedAGIBasedAGI
Menu
Rankings live

finance

Filings summarization (10-K/10-Q)

Summarize filings with conservative factuality and risk highlights.

#1 Recommendation

gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)

external/google/gemini-3-pro-preview

36.7%

Score

47.2%

Confidence

Ranked Models

30

Evidence Quality

88%

Scoring

Benchmark-backed

Top Signal

Vals Finance Agent: overall_accuracy_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)

36.7%
#2gemini-2.5-pro

Strong on FACTS Benchmark Suite facts_grounding_score_pct (100%) and Vals CorpFin v2 overall_accuracy_pct (78%)

32.9%
#3anthropic/claude-sonnet-4.6

Strong on Vals Finance Agent overall_accuracy_pct (100%) and Vals CorpFin v2 overall_accuracy_pct (91%)

31.9%
#4Grok-4-0709
31.6%
#5gpt-5-mini-2025-08-07
30.6%
#6gpt-4.1-20250414
30.5%
#7gpt-5-2025-08-07
29.5%
#8openai/gpt-5.4-2026-03-05
29.4%
#9google/gemini-3.1-pro-preview
29.3%
#10gpt-5.1-2025-11-13
26.1%
#11gpt-5.2-2025-12-11
25.7%
#12anthropic/claude-opus-4-6-thinking
25.2%
#13xai-org/grok-4-fast-reasoning
25.1%
#14xai-org/grok-4-1-fast-reasoning
24.6%
#15gemini-3-flash-preview
24.5%
#16google/gemini-3.1-flash-lite-preview
24.4%
#17claude-sonnet-4-20250514
24.0%
#18anthropic/claude-opus-4-5-20251101-thinking
23.9%
#19kimi/kimi-k2.5-thinking
23.5%
#20claude-opus-4-5-20251101
22.7%
#21anthropic/claude-sonnet-4-5-20250929-thinking
22.3%
#23alibaba/qwen3.5-flash
20.5%
#24zai/glm-5-thinking
20.5%
#25anthropic/claude-haiku-4-5-20251001-thinking
19.8%
#26mistralai/mistral-large-2512
17.5%
#27xai-org/grok-4-1-fast-non-reasoning
17.5%
#28z-ai/glm-4.7
17.0%
#29qwen/qwen3-max
16.8%
#30Kimi K2 Thinking
16.6%
#31gpt-4.1-mini-20250414
16.4%

Compare Models

Model A leads by +3.8%

Shareable Link →

Model A

gemini-3-pro-preview

external/google/gemini-3-pro-preview

36.7%

Rank #1

Confidence 47.2%29 evidence pts

Vals Finance Agent: overall_accuracy_pct

Value 87.0% · Conf 100.0% · Weight 2.9%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 86.7% · Conf 100.0% · Weight 2.8%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

FACTS Benchmark Suite: facts_grounding_score_pct

Value 88.3% · Conf 100.0% · Weight 2.4%

facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)

FACTS Benchmark Suite: facts_search_score_pct

Value 100.0% · Conf 100.0% · Weight 2.1%

facts_benchmark_suite.facts_search_score_pct (Mar 12, 2026)

Model B

gemini-2.5-pro

external/google/gemini-2-5-pro

32.9%

Rank #2

Confidence 48.0%32 evidence pts

FACTS Benchmark Suite: facts_grounding_score_pct

Value 100.0% · Conf 100.0% · Weight 2.7%

facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 78.4% · Conf 100.0% · Weight 2.5%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Vals Finance Agent: overall_accuracy_pct

Value 65.5% · Conf 100.0% · Weight 2.2%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

FACTS Benchmark Suite: average_score_pct

Value 78.3% · Conf 100.0% · Weight 1.6%

facts_benchmark_suite.average_score_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

46

Sources

8

Quality

Sufficient

Vals CorpFin v2

vals_corp_fin_v2

42 rows

1.5% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

42 rows

1.5% avg lift

Vals GPQA

vals_gpqa

36 rows

0.7% avg lift

Vals Mortgage Tax

vals_mortgage_tax

30 rows

1.1% avg lift

Missing Strong Models

gpt-4o

external/openai/gpt-4o

Rank #22

15.2%

Thin evidence after weighting

gpt-4o-2024-05-13

external/openai/gpt-4o-2024-05-13

Rank #51

10.5%

Thin evidence after weighting

deepseek/deepseek-r1

external/deepseek/deepseek-r1

Rank #54

10.5%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.summarize_doctask.entity_extraction

Required Modes

mode.long_context

Domains

domain.finance_equity_research

Related Use Cases