BasedAGIBasedAGI
Menu
Rankings live

creative

Lore bible generator

Create consistent lore references (timelines, factions, glossaries).

#1 Recommendation

qwen-2.5-72b-instruct

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (78%) and Judgemark Official (EQ-Bench Slice) judgemark_score (56%)

external/qwen/qwen-2-5-72b-instruct

22.7%

Score

34.9%

Confidence

Limited benchmark evidence for this use case.

27 ranked models with average evidence of 12.6 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

27

Evidence Quality

79%

Scoring

Benchmark-backed

Top Signal

Creative Writing Official (EQ-Bench Slice): creative_writing_score

All Ranked Models

Max params:
Min confidence:
27 of 27
RankModelScore
#1qwen-2.5-72b-instruct

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (78%) and Judgemark Official (EQ-Bench Slice) judgemark_score (56%)

22.7%
#2gpt-4o

Strong on Creative Writing Official (EQ-Bench Slice) creative_writing_score (84%) and Judgemark Official (EQ-Bench Slice) judgemark_score (74%)

22.3%
#6gpt-4.1-20250414
17.9%
#10gemini-2.5-pro
15.8%
#11Grok-4-0709
14.6%
#14gemma-2-27b-it
12.7%
#25xai-org/grok-4-fast-reasoning
10.3%
#31xai-org/grok-4-1-fast-reasoning
9.8%
#32gemini-3-pro-preview
9.7%
#38gemini-3-flash-preview
9.1%
#40x-ai/grok-3
9.0%
#41gpt-4o-20241120
9.0%
#43claude-sonnet-4-20250514
8.8%
#45gemini-2.5-flash
8.7%
#54gpt-5-mini-2025-08-07
7.5%
#55xai-org/grok-4-1-fast-non-reasoning
7.4%
#61xai-org/grok-4-fast-non-reasoning
7.0%
#62qwen/qwen3-max
6.8%
#64DeepSeek-V2.5
6.7%
#65Meta-Llama-3-70B-Instruct
6.7%
#66deepseek-v3
6.5%
#67gpt-4o-2024-08-06
6.1%
#72Llama-3.1-70B-Instruct
4.7%
#73Llama-4-Scout-17B-16E-Instruct
4.4%
#74Llama-2-7b-chat-hf
4.3%
#75gemma-7b-it
4.3%
#77gemma-2b-it
3.2%

Compare Models

Model A leads by +0.4%

Shareable Link →

Model A

qwen-2.5-72b-instruct

external/qwen/qwen-2-5-72b-instruct

22.7%

Rank #1

Confidence 34.9%15 evidence pts

Creative Writing Official (EQ-Bench Slice): creative_writing_score

Value 78.4% · Conf 100.0% · Weight 5.0%

artificialanalysis_creative_writing_official.creative_writing_score (Mar 12, 2026)

Judgemark Official (EQ-Bench Slice): judgemark_score

Value 55.6% · Conf 100.0% · Weight 2.6%

artificialanalysis_judgemark_official.judgemark_score (Mar 12, 2026)

DuckDB NSQL Leaderboard: all_execution_accuracy

Value 82.7% · Conf 100.0% · Weight 1.6%

duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)

JSONSchemaBench Leaderboard: medium_schema_compliance_pct

Value 90.1% · Conf 100.0% · Weight 1.4%

jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)

Model B

gpt-4o

external/openai/gpt-4o

22.3%

Rank #2

Confidence 27.7%14 evidence pts

Creative Writing Official (EQ-Bench Slice): creative_writing_score

Value 84.4% · Conf 100.0% · Weight 5.3%

artificialanalysis_creative_writing_official.creative_writing_score (Mar 12, 2026)

Judgemark Official (EQ-Bench Slice): judgemark_score

Value 74.3% · Conf 100.0% · Weight 3.5%

artificialanalysis_judgemark_official.judgemark_score (Mar 12, 2026)

EQ-Bench Leaderboard: judgemark_score

Value 74.3% · Conf 100.0% · Weight 1.6%

eq_bench.judgemark_score (Mar 12, 2026)

JSONSchemaBench Leaderboard: medium_schema_compliance_pct

Value 100.0% · Conf 100.0% · Weight 1.6%

jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

27

Sources

8

Quality

Insufficient

Vals Legal Bench

vals_legal_bench

16 rows

0.4% avg lift

Vals CorpFin v2

vals_corp_fin_v2

16 rows

0.3% avg lift

UGI Leaderboard

ugi_main

14 rows

1.3% avg lift

Vectara HHEM Leaderboard

vectara_hhem_leaderboard

14 rows

0.3% avg lift

Missing Strong Models

anthropic/claude-sonnet-4.6

external/anthropic/claude-sonnet-4-6

Rank #4

21.1%

Thin evidence after weighting

google/gemini-3.1-pro-preview

external/google/gemini-3-1-pro-preview

Rank #8

19.3%

Thin evidence after weighting

gpt-5-2025-08-07

external/openai/gpt-5-2025-08-07

Rank #9

19.2%

Thin evidence after weighting

openai/gpt-5.4-2026-03-05

external/openai/gpt-5-4-2026-03-05

Rank #10

18.9%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.worldbuilding_lore_bibletask.json_schema_filling

Required Modes

mode.long_contextmode.json_schema

Domains

domain.creative_writing

Related Use Cases