Model Profile

openai/gpt-4.1

Name: openai/gpt-4.1
Rating: 2.7 (176 reviews)
Author: openai

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/openai/gpt-4-1

Author: openai

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 23.5%

Evidence points: 176

Raw rows: 104

Weighted rows: 24

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

70.8%*

91.8%*

Accuracy

66.0%*

Creativity

Insufficient data

Based

Insufficient data

* Low confidence — limited benchmark evidence for this dimension

3/5 dimensions scored · Last updated Mar 17, 2026

Some fit rows have limited benchmark evidence.

10 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

105

Total Measurements

104

Weighted Measurements

Weighted Sources

Raw Source Coverage

duckdb_nsql_leaderboard 12mws_vision_bench 12languagebench 10artificialanalysis_llm_performance 9baxbench_leaderboard 9sciarena_leaderboard 7

Weighted Source Coverage

languagebench 3languagebench_translation_official 3lexam_leaderboard 3vader_leaderboard 3aider_polyglot 2duckdb_nsql_leaderboard 2

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Archaic and historical translation use_case.history.archaic_translation	history_linguistics	26.8%	31.4%	16	LanguageBench Translation Official (Split): translation_to:bleu
Legal translation use_case.legal.legal_translation	legal	25.1%	29.9%	17	LanguageBench Translation Official (Split): translation_to:bleu
Verilog/VHDL generation use_case.eda.verilog_generation	engineering	21.9%	24.9%	13	SWE-bench Verified Leaderboard: swe_verified_resolved_pct
Historical document summarization use_case.history.historical_doc_summarization	history_linguistics	21.0%	24.7%	15	LanguageBench: overall:mean
Brand voice localization use_case.mkt.brand_voice_localization	marketing_sales	20.9%	24.5%	14	LanguageBench Translation Official (Split): translation_to:bleu
Integration test generation use_case.dev.integration_tests	developer_tools	18.5%	21.1%	13	SWE-bench Verified Leaderboard: swe_verified_resolved_pct
Metric definition workshop use_case.data.metric_definition_workshop	data_analytics	17.8%	24.1%	13	DuckDB NSQL Leaderboard: all_execution_accuracy
Simulation setup assistant use_case.eng.simulation_setup_assistant	engineering	17.1%	20.7%	12	SWE-bench Verified Leaderboard: swe_verified_resolved_pct
Grammar and writing coach use_case.lang.grammar_coach	education	16.9%	19.8%	15	LanguageBench Translation Official (Split): translation_to:bleu
Documentation from code use_case.dev.docstrings_and_docs	developer_tools	16.5%	19.7%	16	SWE-bench Verified Leaderboard: swe_verified_resolved_pct
Contract term extraction use_case.legal.contract_term_extraction	legal	16.4%	20.7%	16	LEXam Leaderboard: average_score_pct
Clause playbook check use_case.legal.playbook_clause_check	legal	16.4%	20.7%	16	LEXam Leaderboard: average_score_pct