Model Profile

Steelskull/L3.3-MS-Nevoria-70b

Name: Steelskull/L3.3-MS-Nevoria-70b
Rating: 2.1 (63 reviews)
Author: steelskull

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/steelskull/l3-3-ms-nevoria-70b

Author: steelskull

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 24.5%

Evidence points: 63

Raw rows: 69

Weighted rows: 8

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ6 benchmarks

82.4%*

EQ4 benchmarks

75.9%*

Accuracy2 benchmarks

77.4%*

Creativity6 benchmarks

59.4%

Based3 benchmarks

72.4%*

* Low confidence — limited benchmark evidence for this dimension

5/5 dimensions scored · Last updated Apr 30, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Open LLM Leaderboard GPQA

gpqa

8.1%

Normalized value 100.0% · confidence 100.0%

Strongest impact in Social post generation

openllm_gpqa_official.gpqa · Apr 30, 2026

Open LLM Leaderboard MMLU-Pro

mmlu_pro_accuracy_pct

5.8%

Normalized value 72.0% · confidence 100.0%

Strongest impact in Social post generation

openllm_mmlu_pro_official.mmlu_pro_accuracy_pct · Apr 30, 2026

UGI Leaderboard

Writing ✍️

1.6%

Normalized value 53.6% · confidence 100.0%

Strongest impact in Poetry and lyrics

ugi_main.writing · Apr 30, 2026

UGI Leaderboard

Entertainment

1.1%

Normalized value 42.7% · confidence 100.0%

Strongest impact in Poetry and lyrics

ugi_main.entertainment · Apr 30, 2026

UGI Leaderboard

Hazardous

0.9%

Normalized value 47.0% · confidence 100.0%

Strongest impact in Crisis escalation protocol (eval)

ugi_main.hazardous · Apr 30, 2026

Open LLM Leaderboard BBH

bbh

0.7%

Normalized value 73.7% · confidence 100.0%

Strongest impact in Social post generation

openllm_bbh_official.bbh · Apr 30, 2026

Some fit rows have limited benchmark evidence.

7 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

111

Total Measurements

Weighted Measurements

Weighted Sources

Raw Source Coverage

ugi_main 60open_llm_leaderboard_results 5openllm_bbh_official 1openllm_gpqa_official 1openllm_ifeval_official 1openllm_mmlu_pro_official 1

Weighted Source Coverage

ugi_main 3open_llm_leaderboard_results 1openllm_bbh_official 1openllm_gpqa_official 1openllm_ifeval_official 1openllm_mmlu_pro_official 1

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Poetry and lyrics use_case.creative.poetry_lyrics	creative	21.4%	28.2%	6	Open LLM Leaderboard GPQA: gpqa
Screenplay scene writing use_case.creative.screenplay_scene	creative	21.4%	28.2%	6	Open LLM Leaderboard GPQA: gpqa
Social post generation use_case.mkt.social_post_generation	marketing_sales	20.6%	25.2%	5	Open LLM Leaderboard GPQA: gpqa
Campaign brief use_case.mkt.campaign_brief	marketing_sales	20.6%	25.2%	5	Open LLM Leaderboard GPQA: gpqa
Product positioning and messaging use_case.mkt.product_positioning	marketing_sales	20.6%	25.2%	5	Open LLM Leaderboard GPQA: gpqa
Ad copy variants use_case.mkt.ad_copy_variants	marketing_sales	19.8%	24.1%	5	Open LLM Leaderboard GPQA: gpqa
Personalized sales outreach use_case.mkt.sales_outreach_personalized	marketing_sales	19.8%	24.1%	5	Open LLM Leaderboard GPQA: gpqa
Long-form story co-author use_case.creative.longform_story	creative	18.5%	24.5%	6	Open LLM Leaderboard GPQA: gpqa
Crisis escalation protocol (eval) use_case.safety.crisis_escalation_protocol	risk_eval	18.0%	22.3%	5	Open LLM Leaderboard GPQA: gpqa
Refusal profile (eval) use_case.security.refusal_profile_eval	risk_eval	18.0%	22.3%	5	Open LLM Leaderboard GPQA: gpqa
Jailbreak resistance (eval) use_case.security.jailbreak_resistance_eval	risk_eval	18.0%	22.3%	5	Open LLM Leaderboard GPQA: gpqa
Overrefusal (eval) use_case.security.overrefusal_eval	risk_eval	18.0%	22.3%	5	Open LLM Leaderboard GPQA: gpqa