BasedAGIBasedAGI
Menu
Rankings live

finance

Transaction anomaly narrative

Summarize anomalies into hypotheses, evidence, and follow-up actions.

#1 Recommendation

gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)

external/google/gemini-3-pro-preview

42.3%

Score

54.4%

Confidence

Ranked Models

30

Evidence Quality

89%

Scoring

Benchmark-backed

Top Signal

Vals Finance Agent: overall_accuracy_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)

42.3%
#2gemini-2.5-pro

Strong on FACTS Benchmark Suite facts_grounding_score_pct (100%) and Vals CorpFin v2 overall_accuracy_pct (78%)

38.0%
#3anthropic/claude-sonnet-4.6

Strong on Vals Finance Agent overall_accuracy_pct (100%) and Vals CorpFin v2 overall_accuracy_pct (91%)

36.1%
#4Grok-4-0709
35.7%
#5gpt-5-mini-2025-08-07
34.6%
#6gpt-5-2025-08-07
33.3%
#7openai/gpt-5.4-2026-03-05
33.2%
#8google/gemini-3.1-pro-preview
33.1%
#9gpt-4.1-20250414
32.0%
#10gpt-5.1-2025-11-13
29.5%
#11gpt-5.2-2025-12-11
29.1%
#12anthropic/claude-opus-4-6-thinking
28.5%
#13xai-org/grok-4-fast-reasoning
28.4%
#14xai-org/grok-4-1-fast-reasoning
27.9%
#15gemini-3-flash-preview
27.7%
#16google/gemini-3.1-flash-lite-preview
27.6%
#17claude-sonnet-4-20250514
27.2%
#18anthropic/claude-opus-4-5-20251101-thinking
27.1%
#19kimi/kimi-k2.5-thinking
26.6%
#20claude-opus-4-5-20251101
25.7%
#21anthropic/claude-sonnet-4-5-20250929-thinking
25.2%
#23alibaba/qwen3.5-flash
23.2%
#24zai/glm-5-thinking
23.2%
#25anthropic/claude-haiku-4-5-20251001-thinking
22.4%
#26mistralai/mistral-large-2512
19.8%
#27xai-org/grok-4-1-fast-non-reasoning
19.8%
#28z-ai/glm-4.7
19.2%
#29qwen/qwen3-max
19.0%
#30Kimi K2 Thinking
18.7%
#31gpt-4.1-mini-20250414
18.6%

Compare Models

Model A leads by +4.3%

Shareable Link →

Model A

gemini-3-pro-preview

external/google/gemini-3-pro-preview

42.3%

Rank #1

Confidence 54.4%29 evidence pts

Vals Finance Agent: overall_accuracy_pct

Value 87.0% · Conf 100.0% · Weight 3.3%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 86.7% · Conf 100.0% · Weight 3.1%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

FACTS Benchmark Suite: facts_grounding_score_pct

Value 88.3% · Conf 100.0% · Weight 2.6%

facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)

FACTS Benchmark Suite: facts_search_score_pct

Value 100.0% · Conf 100.0% · Weight 2.3%

facts_benchmark_suite.facts_search_score_pct (Mar 12, 2026)

Model B

gemini-2.5-pro

external/google/gemini-2-5-pro

38.0%

Rank #2

Confidence 55.3%32 evidence pts

FACTS Benchmark Suite: facts_grounding_score_pct

Value 100.0% · Conf 100.0% · Weight 3.0%

facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 78.4% · Conf 100.0% · Weight 2.8%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Vals Finance Agent: overall_accuracy_pct

Value 65.5% · Conf 100.0% · Weight 2.5%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

FACTS Benchmark Suite: average_score_pct

Value 78.3% · Conf 100.0% · Weight 1.7%

facts_benchmark_suite.average_score_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

49

Sources

8

Quality

Sufficient

Vals CorpFin v2

vals_corp_fin_v2

42 rows

1.7% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

42 rows

1.7% avg lift

Vals GPQA

vals_gpqa

36 rows

0.7% avg lift

Vals Mortgage Tax

vals_mortgage_tax

30 rows

1.3% avg lift

Missing Strong Models

gpt-4o

external/openai/gpt-4o

Rank #22

15.2%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.multi_doc_synthesistask.risk_assessment

Required Modes

mode.long_context

Domains

domain.finance_compliance_aml

Related Use Cases