finance
Transaction anomaly narrative
Summarize anomalies into hypotheses, evidence, and follow-up actions.
#1 Recommendation
gemini-3-pro-preview
Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)
external/google/gemini-3-pro-preview
42.3%
Score
54.4%
Confidence
Ranked Models
30
Evidence Quality
89%
Scoring
Benchmark-backed
Top Signal
Vals Finance Agent: overall_accuracy_pct
All Ranked Models
Compare Models
Model A leads by +4.3%
Shareable Link →Model A
gemini-3-pro-preview
external/google/gemini-3-pro-preview
Rank #1
Vals Finance Agent: overall_accuracy_pct
Value 87.0% · Conf 100.0% · Weight 3.3%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
Vals CorpFin v2: overall_accuracy_pct
Value 86.7% · Conf 100.0% · Weight 3.1%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
FACTS Benchmark Suite: facts_grounding_score_pct
Value 88.3% · Conf 100.0% · Weight 2.6%
facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)
FACTS Benchmark Suite: facts_search_score_pct
Value 100.0% · Conf 100.0% · Weight 2.3%
facts_benchmark_suite.facts_search_score_pct (Mar 12, 2026)
Model B
gemini-2.5-pro
external/google/gemini-2-5-pro
Rank #2
FACTS Benchmark Suite: facts_grounding_score_pct
Value 100.0% · Conf 100.0% · Weight 3.0%
facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)
Vals CorpFin v2: overall_accuracy_pct
Value 78.4% · Conf 100.0% · Weight 2.8%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Finance Agent: overall_accuracy_pct
Value 65.5% · Conf 100.0% · Weight 2.5%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
FACTS Benchmark Suite: average_score_pct
Value 78.3% · Conf 100.0% · Weight 1.7%
facts_benchmark_suite.average_score_pct (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
49
Sources
8
Quality
Sufficient
Vals CorpFin v2
vals_corp_fin_v2
42 rows
1.7% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
42 rows
1.7% avg lift
Vals GPQA
vals_gpqa
36 rows
0.7% avg lift
Vals Mortgage Tax
vals_mortgage_tax
30 rows
1.3% avg lift
Missing Strong Models
gpt-4o
external/openai/gpt-4o
Rank #22
15.2%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
finance
Thesis red teaming
Stress-test an investment thesis with counterarguments and risk.
Top: gemini-3-pro-preview
finance
Earnings call synthesis
Summarize earnings calls into key points, tone, and risks.
Top: gemini-3-pro-preview
finance
AML alert triage
Triage AML alerts into severity, rationale, and next actions.
Top: gemini-3-pro-preview
finance
KYC profile synthesis
Turn identity docs and notes into a structured KYC profile.
Top: gemini-3-pro-preview