BasedAGIBasedAGI
Menu
Rankings live

finance

Accounts payable invoice extraction (text)

Extract structured fields from invoices/receipts for AP workflows.

#1 Recommendation

gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)

external/google/gemini-3-pro-preview

36.0%

Score

45.8%

Confidence

Ranked Models

30

Evidence Quality

88%

Scoring

Benchmark-backed

Top Signal

Vals Finance Agent: overall_accuracy_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-3-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct (87%) and Vals CorpFin v2 overall_accuracy_pct (87%)

36.0%
#2Grok-4-0709

Strong on Vals CorpFin v2 overall_accuracy_pct (94%) and Vals Finance Agent overall_accuracy_pct (84%)

35.7%
#3anthropic/claude-sonnet-4.6

Strong on Vals Finance Agent overall_accuracy_pct (100%) and Vals CorpFin v2 overall_accuracy_pct (91%)

33.9%
#4gemini-2.5-pro
33.3%
#5google/gemini-3.1-pro-preview
32.5%
#6openai/gpt-5.4-2026-03-05
32.3%
#7gpt-5-mini-2025-08-07
32.2%
#8gpt-4.1-20250414
32.0%
#9gpt-5-2025-08-07
31.3%
#10gpt-5.2-2025-12-11
30.3%
#11anthropic/claude-opus-4-6-thinking
29.5%
#12gpt-5.1-2025-11-13
29.3%
#13anthropic/claude-opus-4-5-20251101-thinking
28.5%
#14xai-org/grok-4-fast-reasoning
27.9%
#15xai-org/grok-4-1-fast-reasoning
27.6%
#16kimi/kimi-k2.5-thinking
27.3%
#17claude-sonnet-4-20250514
27.1%
#18gemini-3-flash-preview
26.8%
#19anthropic/claude-sonnet-4-5-20250929-thinking
26.4%
#20google/gemini-3.1-flash-lite-preview
26.4%
#21alibaba/qwen3.5-flash
24.1%
#23anthropic/claude-haiku-4-5-20251001-thinking
23.1%
#24zai/glm-5-thinking
22.9%
#25claude-opus-4-5-20251101
22.6%
#26gpt-4.1-mini-20250414
21.5%
#27z-ai/glm-4.7
20.2%
#28qwen/qwen3-max
19.8%
#29Kimi K2 Thinking
19.1%
#30mistralai/mistral-large-2512
19.0%
#31xai-org/grok-4-1-fast-non-reasoning
18.9%

Compare Models

Model A leads by +0.3%

Shareable Link →

Model A

gemini-3-pro-preview

external/google/gemini-3-pro-preview

36.0%

Rank #1

Confidence 45.8%29 evidence pts

Vals Finance Agent: overall_accuracy_pct

Value 87.0% · Conf 100.0% · Weight 2.7%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 86.7% · Conf 100.0% · Weight 2.4%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Vals Tax Eval v2: overall_accuracy_pct

Value 87.1% · Conf 100.0% · Weight 2.2%

vals_tax_eval_v2.overall_accuracy_pct (Mar 12, 2026)

Vals Mortgage Tax: overall_accuracy_pct

Value 99.3% · Conf 100.0% · Weight 1.8%

vals_mortgage_tax.overall_accuracy_pct (Mar 12, 2026)

Model B

Grok-4-0709

external/xai/grok-4-0709

35.7%

Rank #2

Confidence 49.2%27 evidence pts

Vals CorpFin v2: overall_accuracy_pct

Value 93.6% · Conf 100.0% · Weight 2.6%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Vals Finance Agent: overall_accuracy_pct

Value 84.4% · Conf 100.0% · Weight 2.6%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

Vals Tax Eval v2: overall_accuracy_pct

Value 65.9% · Conf 100.0% · Weight 1.7%

vals_tax_eval_v2.overall_accuracy_pct (Mar 12, 2026)

Vals Finance Agent: numerical_reasoning_accuracy_pct

Value 94.9% · Conf 100.0% · Weight 1.6%

vals_finance_agent.numerical_reasoning_accuracy_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

51

Sources

8

Quality

Sufficient

Vals CorpFin v2

vals_corp_fin_v2

42 rows

1.3% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

42 rows

2.1% avg lift

Vals GPQA

vals_gpqa

40 rows

0.9% avg lift

Vals Mortgage Tax

vals_mortgage_tax

33 rows

1.5% avg lift

Missing Strong Models

gpt-4o-2024-05-13

external/openai/gpt-4o-2024-05-13

Rank #51

10.5%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.invoice_receipt_extractiontask.json_schema_filling

Required Modes

mode.json_schema

Domains

domain.finance_compliance_aml

Related Use Cases