Model Profile

gpt-oss-20b

Name: gpt-oss-20b
Rating: 0.4 (10 reviews)
Author: openai

4,096 ctxOpen weights

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: openai/gpt-oss-20b

Author: openai

Origin: huggingface_catalog

Arch: unknown

Benchmark Coverage

Scored use cases: 2

Avg confidence: 10.5%

Evidence points: 10

Raw rows: 22

Weighted rows: 6

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 5,541,163

Intelligence Profile

Dimension Breakdown

IQ1 benchmark

34.6%*

EQ0 benchmarks

No eq benchmarks found

Insufficient data

Accuracy0 benchmarks

No accuracy benchmarks found

Insufficient data

Creativity0 benchmarks

No creativity benchmarks found

Insufficient data

Based0 benchmarks

No based benchmarks found

Insufficient data

* Low confidence — limited benchmark evidence for this dimension

1/5 dimensions scored · Last updated Apr 25, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

LEXam Leaderboard

average_score_pct

1.7%

Normalized value 34.6% · confidence 100.0%

Strongest impact in Contract Drafting & Redlining

lexam_leaderboard.average_score_pct · Mar 31, 2026

LEXam Leaderboard

open_question_judge_score_pct

0.6%

Normalized value 31.2% · confidence 100.0%

Strongest impact in Contract redline summary

lexam_leaderboard.open_question_judge_score_pct · Mar 31, 2026

LEXam Leaderboard

mcq_accuracy_pct

0.4%

Normalized value 44.4% · confidence 100.0%

Strongest impact in Contract redline summary

lexam_leaderboard.mcq_accuracy_pct · Mar 31, 2026

SciArena Leaderboard

rating_elo

0.3%

Normalized value 27.1% · confidence 100.0%

Strongest impact in Contract redline summary

sciarena_leaderboard.rating_elo · Apr 1, 2026

BRIDGE Medical Leaderboard

average_performance_pct

0.1%

Normalized value 40.8% · confidence 100.0%

Strongest impact in Contract redline summary

bridge_medical_leaderboard.average_performance_pct · Apr 1, 2026

Some fit rows have limited benchmark evidence.

2 of 2 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

Total Measurements

Weighted Measurements

Weighted Sources

Raw Source Coverage

bridge_medical_leaderboard 9sciarena_leaderboard 7lexam_leaderboard 3openrouter_models 3

Weighted Source Coverage

lexam_leaderboard 3bridge_medical_leaderboard 2sciarena_leaderboard 1

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Contract Drafting & Redlining use_case.legal.contract_drafting	legal	3.7%	10.9%	5	LEXam Leaderboard: average_score_pct
Contract redline summary use_case.legal.contract_redline_summary	legal	3.4%	10.1%	5	LEXam Leaderboard: average_score_pct