finance
Accounts payable invoice extraction (text)
Extract structured fields from invoices/receipts for AP workflows.
#1 Recommendation
gemini-3-pro-preview
Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)
external/google/gemini-3-pro-preview
36.0%
Score
45.8%
Confidence
Ranked Models
30
Evidence Quality
88%
Scoring
Benchmark-backed
Top Signal
Vals Finance Agent: overall_accuracy_pct
All Ranked Models
Compare Models
Model A leads by +0.3%
Shareable Link →Model A
gemini-3-pro-preview
external/google/gemini-3-pro-preview
Rank #1
Vals Finance Agent: overall_accuracy_pct
Value 87.0% · Conf 100.0% · Weight 2.7%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
Vals CorpFin v2: overall_accuracy_pct
Value 86.7% · Conf 100.0% · Weight 2.4%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Tax Eval v2: overall_accuracy_pct
Value 87.1% · Conf 100.0% · Weight 2.2%
vals_tax_eval_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Mortgage Tax: overall_accuracy_pct
Value 99.3% · Conf 100.0% · Weight 1.8%
vals_mortgage_tax.overall_accuracy_pct (Mar 12, 2026)
Model B
Grok-4-0709
external/xai/grok-4-0709
Rank #2
Vals CorpFin v2: overall_accuracy_pct
Value 93.6% · Conf 100.0% · Weight 2.6%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Finance Agent: overall_accuracy_pct
Value 84.4% · Conf 100.0% · Weight 2.6%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
Vals Tax Eval v2: overall_accuracy_pct
Value 65.9% · Conf 100.0% · Weight 1.7%
vals_tax_eval_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Finance Agent: numerical_reasoning_accuracy_pct
Value 94.9% · Conf 100.0% · Weight 1.6%
vals_finance_agent.numerical_reasoning_accuracy_pct (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
51
Sources
8
Quality
Sufficient
Vals CorpFin v2
vals_corp_fin_v2
42 rows
1.3% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
42 rows
2.1% avg lift
Vals GPQA
vals_gpqa
40 rows
0.9% avg lift
Vals Mortgage Tax
vals_mortgage_tax
33 rows
1.5% avg lift
Missing Strong Models
gpt-4o-2024-05-13
external/openai/gpt-4o-2024-05-13
Rank #51
10.5%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
finance
Thesis red teaming
Stress-test an investment thesis with counterarguments and risk.
Top: gemini-3-pro-preview
finance
Earnings call synthesis
Summarize earnings calls into key points, tone, and risks.
Top: gemini-3-pro-preview
finance
AML alert triage
Triage AML alerts into severity, rationale, and next actions.
Top: gemini-3-pro-preview
finance
KYC profile synthesis
Turn identity docs and notes into a structured KYC profile.
Top: gemini-3-pro-preview