Model Profile

microsoft/wizardlm-2-8x22b

Name: microsoft/wizardlm-2-8x22b
Rating: 1.9 (87 reviews)
Author: microsoft

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/microsoft/wizardlm-2-8x22b

Author: microsoft

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 27.6%

Evidence points: 87

Raw rows: 90

Weighted rows: 11

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ7 benchmarks

56.6%

EQ5 benchmarks

71.8%*

Accuracy2 benchmarks

58.6%*

Creativity7 benchmarks

45.2%

Based3 benchmarks

53.1%*

* Low confidence — limited benchmark evidence for this dimension

5/5 dimensions scored · Last updated Apr 30, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Open LLM Leaderboard GPQA

gpqa

4.8%

Normalized value 59.7% · confidence 100.0%

Strongest impact in Social post generation

openllm_gpqa_official.gpqa · Apr 30, 2026

Open LLM Leaderboard MMLU-Pro

mmlu_pro_accuracy_pct

4.6%

Normalized value 57.1% · confidence 100.0%

Strongest impact in Social post generation

openllm_mmlu_pro_official.mmlu_pro_accuracy_pct · Apr 30, 2026

EQ-Bench Leaderboard

eq_bench_score

3.1%

Normalized value 86.9% · confidence 100.0%

Strongest impact in Social post generation

eq_bench.eq_bench_score · Apr 30, 2026

Open LLM Leaderboard BBH

bbh

1.7%

Normalized value 63.2% · confidence 100.0%

Strongest impact in Brand voice localization

openllm_bbh_official.bbh · Apr 30, 2026

Open LLM Leaderboard IFEval

ifeval

1.3%

Normalized value 58.6% · confidence 100.0%

Strongest impact in Brand voice localization

openllm_ifeval_official.ifeval · Apr 30, 2026

UGI Leaderboard

Writing ✍️

1.2%

Normalized value 39.6% · confidence 100.0%

Strongest impact in Screenplay scene writing

ugi_main.writing · Apr 30, 2026

Some fit rows have limited benchmark evidence.

5 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

124

Total Measurements

Weighted Measurements

Weighted Sources

Raw Source Coverage

ugi_main 60mmlu_pro_leaderboard 15open_llm_leaderboard_results 5openrouter_models 3aider_code_editing 2eq_bench 1

Weighted Source Coverage

ugi_main 3aider_code_editing 2eq_bench 1open_llm_leaderboard_results 1openllm_bbh_official 1openllm_gpqa_official 1

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Product positioning and messaging use_case.mkt.product_positioning	marketing_sales	18.6%	30.7%	7	Open LLM Leaderboard GPQA: gpqa
Campaign brief use_case.mkt.campaign_brief	marketing_sales	18.6%	30.7%	7	Open LLM Leaderboard GPQA: gpqa
Social post generation use_case.mkt.social_post_generation	marketing_sales	18.6%	30.7%	7	Open LLM Leaderboard GPQA: gpqa
Ad copy variants use_case.mkt.ad_copy_variants	marketing_sales	17.9%	29.4%	7	Open LLM Leaderboard GPQA: gpqa
Personalized sales outreach use_case.mkt.sales_outreach_personalized	marketing_sales	17.9%	29.4%	7	Open LLM Leaderboard GPQA: gpqa
Screenplay scene writing use_case.creative.screenplay_scene	creative	15.9%	29.9%	8	Open LLM Leaderboard GPQA: gpqa
Poetry and lyrics use_case.creative.poetry_lyrics	creative	15.9%	29.9%	8	Open LLM Leaderboard GPQA: gpqa
Brand voice localization use_case.mkt.brand_voice_localization	marketing_sales	14.8%	24.4%	8	Open LLM Leaderboard GPQA: gpqa
Overrefusal (eval) use_case.security.overrefusal_eval	risk_eval	13.8%	23.9%	7	Open LLM Leaderboard GPQA: gpqa
Scam and social engineering resistance (eval) use_case.security.scam_social_engineering_resistance_eval	risk_eval	13.8%	23.9%	7	Open LLM Leaderboard GPQA: gpqa
Crisis escalation protocol (eval) use_case.safety.crisis_escalation_protocol	risk_eval	13.8%	23.9%	7	Open LLM Leaderboard GPQA: gpqa
Jailbreak resistance (eval) use_case.security.jailbreak_resistance_eval	risk_eval	13.8%	23.9%	7	Open LLM Leaderboard GPQA: gpqa