Model Profile

Llama 3.3 70B Instruct

Name: Llama 3.3 70B Instruct
Rating: 2.2 (141 reviews)
Author: meta

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/meta/llama-3-3-70b-instruct

Author: meta

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 48.5%

Evidence points: 141

Raw rows: 227

Weighted rows: 28

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ2 benchmarks

34.4%*

EQ8 benchmarks

21.7%*

Accuracy3 benchmarks

35.1%*

Creativity2 benchmarks

36.9%*

Based3 benchmarks

31.3%*

* Low confidence — limited benchmark evidence for this dimension

5/5 dimensions scored · Last updated Apr 14, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

BFCL Relevance Detection Official

Relevance Detection

7.4%

Normalized value 100.0% · confidence 100.0%

Strongest impact in Casual chat companion

bfcl_relevance_detection_official.relevance_detection · Apr 1, 2026

BFCL Multi-turn Official

Multi Turn Acc

2.7%

Normalized value 27.8% · confidence 100.0%

Strongest impact in Casual chat companion

bfcl_multiturn_official.multi_turn_acc · Apr 1, 2026

BigCodeBench Official

bigcodebench_complete_pct

2.0%

Normalized value 91.0% · confidence 100.0%

Strongest impact in Verilog/VHDL generation

bigcodebench_official.bigcodebench_complete_pct · Apr 1, 2026

BFCL Memory Official

Memory Acc

1.7%

Normalized value 14.6% · confidence 100.0%

Strongest impact in Adult ERP roleplay (explicit)

bfcl_memory_official.memory_acc · Apr 1, 2026

BigCodeBench Official

bigcodebench_instruct_pct

1.5%

Normalized value 90.5% · confidence 100.0%

Strongest impact in Integration test generation

bigcodebench_official.bigcodebench_instruct_pct · Apr 1, 2026

BFCL Relevance Detection Official

Irrelevance Detection

1.5%

Normalized value 50.4% · confidence 100.0%

Strongest impact in Casual chat companion

bfcl_relevance_detection_official.irrelevance_detection · Apr 1, 2026

Some fit rows have limited benchmark evidence.

2 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

133

Total Measurements

227

Weighted Measurements

Weighted Sources

Raw Source Coverage

ugi_main 60galileo_agent_v2 34bfcl_adjacent_public 30bfcl_overall 30mmlu_pro_leaderboard 15llm_aggrefact_leaderboard 12

Weighted Source Coverage

galileo_agent_v2 10bigcodebench_official 3ugi_main 3aider_code_editing 2bfcl_relevance_detection_official 2bridge_medical_leaderboard 2

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Casual chat companion use_case.companion.casual_chat	companion	22.4%	57.4%	12	BFCL Relevance Detection Official: Relevance Detection
Life coaching and goal planning use_case.companion.life_coaching	companion	22.4%	57.4%	12	BFCL Relevance Detection Official: Relevance Detection
Tarot-style reading use_case.spiritual.tarot_reading	companion	22.4%	57.4%	12	BFCL Relevance Detection Official: Relevance Detection
Empathetic support chat use_case.companion.empathy_support_chat	companion	21.7%	55.8%	12	BFCL Relevance Detection Official: Relevance Detection
Mindfulness and meditation scripts use_case.wellness.mindfulness_scripts	companion	21.6%	55.5%	12	BFCL Relevance Detection Official: Relevance Detection
Adult ERP roleplay (explicit) use_case.adult.erp_roleplay	adult	19.7%	55.7%	12	BFCL Relevance Detection Official: Relevance Detection
SFW roleplay and simulation use_case.creative.sfw_roleplay_simulation	creative	18.6%	51.7%	12	BFCL Relevance Detection Official: Relevance Detection
NPC dialogue use_case.gaming.npc_dialogue	creative	17.9%	49.9%	12	BFCL Relevance Detection Official: Relevance Detection
Interactive fiction / DM use_case.creative.interactive_fiction_dm	creative	17.9%	49.9%	12	BFCL Relevance Detection Official: Relevance Detection
Adult erotica (long-form, explicit) use_case.adult.erotica_longform	adult	17.3%	49.4%	12	BFCL Relevance Detection Official: Relevance Detection
Integration test generation use_case.dev.integration_tests	developer_tools	12.2%	21.8%	11	BigCodeBench Official: bigcodebench_complete_pct
Verilog/VHDL generation use_case.eda.verilog_generation	engineering	11.6%	20.1%	10	BigCodeBench Official: bigcodebench_complete_pct