companion

Arch-Agent-32B vs Grok-4-0709

For Casual chat companion

Model A winsby +5.5%

Model A

Winner

Arch-Agent-32B

katanemo/Arch-Agent-32B

23.4%

Rank #30

Confidence

42.2%

Evidence

4 pts

Confidence 42.2%4 evidence pts

BFCL Multi-turn Official: Multi Turn Acc

Value 70.1% · Conf 100.0% · Weight 7.3%

bfcl_multiturn_official.multi_turn_acc (Mar 12, 2026)

BFCL Relevance Detection Official: Relevance Detection

Value 81.3% · Conf 100.0% · Weight 6.5%

bfcl_relevance_detection_official.relevance_detection (Mar 12, 2026)

BFCL Relevance Detection Official: Irrelevance Detection

Value 81.0% · Conf 100.0% · Weight 2.6%

bfcl_relevance_detection_official.irrelevance_detection (Mar 12, 2026)

BFCL Memory Official: Memory Acc

Value 19.8% · Conf 100.0% · Weight 2.4%

bfcl_memory_official.memory_acc (Mar 12, 2026)

Model B

Grok-4-0709

external/xai/grok-4-0709

17.9%

Rank #58

Confidence

22.8%

Evidence

20 pts

Confidence 22.8%20 evidence pts

UGI Leaderboard: Entertainment

Value 100.0% · Conf 100.0% · Weight 2.8%

ugi_main.entertainment (Mar 12, 2026)

UGI Leaderboard: Writing ✍️

Value 99.2% · Conf 100.0% · Weight 2.8%

ugi_main.writing (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 84.6% · Conf 100.0% · Weight 1.2%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 56.5% · Conf 100.0% · Weight 1.1%

galileo_agent_v2.avg_ac (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 93.6% · Conf 100.0% · Weight 0.6%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Back to Casual chat companion Arch-Agent-32B Profile Grok-4-0709 Profile