Model Profile

dfurman/CalmeRys-78B-Orpo-v0.1

Name: dfurman/CalmeRys-78B-Orpo-v0.1
Rating: 1.9 (49 reviews)
Author: dfurman

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/dfurman/calmerys-78b-orpo-v0-1

Author: dfurman

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 21.5%

Evidence points: 49

Raw rows: 9

Weighted rows: 5

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ6 benchmarks

81.0%*

EQ4 benchmarks

86.5%*

Accuracy2 benchmarks

90.7%*

Creativity4 benchmarks

87.4%*

Based2 benchmarks

68.1%*

* Low confidence — limited benchmark evidence for this dimension

5/5 dimensions scored · Last updated Apr 30, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Open LLM Leaderboard MMLU-Pro

mmlu_pro_accuracy_pct

7.7%

Normalized value 95.4% · confidence 100.0%

Strongest impact in Social post generation

openllm_mmlu_pro_official.mmlu_pro_accuracy_pct · Apr 30, 2026

Open LLM Leaderboard GPQA

gpqa

5.5%

Normalized value 68.1% · confidence 100.0%

Strongest impact in Social post generation

openllm_gpqa_official.gpqa · Apr 30, 2026

Open LLM Leaderboard IFEval

ifeval

3.8%

Normalized value 90.7% · confidence 100.0%

Strongest impact in Job description drafting

openllm_ifeval_official.ifeval · Apr 30, 2026

Open LLM Leaderboard BBH

bbh

2.1%

Normalized value 80.7% · confidence 100.0%

Strongest impact in Job description drafting

openllm_bbh_official.bbh · Apr 30, 2026

Open LLM Leaderboard Results

ifeval

0.4%

Normalized value 90.7% · confidence 100.0%

Strongest impact in Social post generation

open_llm_leaderboard_results.ifeval · Apr 30, 2026

Some fit rows have limited benchmark evidence.

12 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

110

Total Measurements

Weighted Measurements

Weighted Sources

Raw Source Coverage

open_llm_leaderboard_results 5openllm_bbh_official 1openllm_gpqa_official 1openllm_ifeval_official 1openllm_mmlu_pro_official 1

Weighted Source Coverage

open_llm_leaderboard_results 1openllm_bbh_official 1openllm_gpqa_official 1openllm_ifeval_official 1openllm_mmlu_pro_official 1

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Social post generation use_case.mkt.social_post_generation	marketing_sales	18.6%	22.7%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Product positioning and messaging use_case.mkt.product_positioning	marketing_sales	18.6%	22.7%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Campaign brief use_case.mkt.campaign_brief	marketing_sales	18.6%	22.7%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Job description drafting use_case.hr.job_description_drafting	hr_recruiting	18.1%	21.6%	5	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Ad copy variants use_case.mkt.ad_copy_variants	marketing_sales	17.8%	21.8%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Personalized sales outreach use_case.mkt.sales_outreach_personalized	marketing_sales	17.8%	21.8%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Screenplay scene writing use_case.creative.screenplay_scene	creative	17.1%	20.9%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Poetry and lyrics use_case.creative.poetry_lyrics	creative	17.1%	20.9%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Config debugging use_case.sre.config_debugging	devops_sre	17.1%	20.9%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Terraform generation use_case.sre.iac_terraform	devops_sre	17.1%	20.9%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Kubernetes manifest generation use_case.sre.iac_k8s	devops_sre	17.1%	20.9%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct
Crisis escalation protocol (eval) use_case.safety.crisis_escalation_protocol	risk_eval	16.1%	19.7%	4	Open LLM Leaderboard MMLU-Pro: mmlu_pro_accuracy_pct