companion

Arch-Agent-32B vs Grok-4-0709

For Mindfulness and meditation scripts

Model A winsby +5.1%

Model A

Winner

Arch-Agent-32B

katanemo/Arch-Agent-32B

21.6%

Rank #30

Confidence

38.9%

Evidence

4 pts

Confidence 38.9%4 evidence pts

BFCL Multi-turn Official: Multi Turn Acc

Value 70.1% · Conf 100.0% · Weight 6.8%

bfcl_multiturn_official.multi_turn_acc (Mar 12, 2026)

BFCL Relevance Detection Official: Relevance Detection

Value 81.3% · Conf 100.0% · Weight 6.1%

bfcl_relevance_detection_official.relevance_detection (Mar 12, 2026)

BFCL Relevance Detection Official: Irrelevance Detection

Value 81.0% · Conf 100.0% · Weight 2.4%

bfcl_relevance_detection_official.irrelevance_detection (Mar 12, 2026)

BFCL Memory Official: Memory Acc

Value 19.8% · Conf 100.0% · Weight 2.3%

bfcl_memory_official.memory_acc (Mar 12, 2026)

Model B

Grok-4-0709

external/xai/grok-4-0709

16.5%

Rank #59

Confidence

21.1%

Evidence

20 pts

Confidence 21.1%20 evidence pts

UGI Leaderboard: Entertainment

Value 100.0% · Conf 100.0% · Weight 2.6%

ugi_main.entertainment (Mar 12, 2026)

UGI Leaderboard: Writing ✍️

Value 99.2% · Conf 100.0% · Weight 2.6%

ugi_main.writing (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 84.6% · Conf 100.0% · Weight 1.1%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 56.5% · Conf 100.0% · Weight 1.1%

galileo_agent_v2.avg_ac (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 93.6% · Conf 100.0% · Weight 0.5%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Back to Mindfulness and meditation scripts Arch-Agent-32B Profile Grok-4-0709 Profile