live
weekly refresh
basedagi.org
▸ guide

Best Free LLMs in 2026
Ranked by Benchmarks

Open-weight models and free API tiers, ranked by public benchmark scores.

Run locally

Open-weight entries can run on hardware you control. API cost is replaced by infrastructure and operations cost, so benchmark score is only one part of deployment choice.

open-weight models →
Free API tiers

Zero-price API entries are included only when current pricing data reports zero input cost. Check the provider for rate limits and production terms.

free API models →
Best per task

Not all free models excel at every task. Filter by task to find the best free option for coding, reasoning, or tool use specifically.

filter by task →
▸ what you trade off going free
Raw performanceUse the current task score and gap above. Weight availability does not imply how close a model is to the frontier leader.
Infrastructure costRunning a 70B model requires significant GPU memory. Cloud GPU costs can exceed API costs for high-volume workloads.
Rate limits on free tiersFree API tiers typically throttle at 15–60 RPM. Not suitable for production scale.
Maintenance overheadSelf-hosting requires model management, quantization decisions, and prompt optimization for your specific hardware.
FreshnessRanks change when models or benchmark results change. Check the scored-on date before selecting a deployment target.
▸ frequently asked

What is the best free LLM in 2026?

The current winner is selected from measured open-weight models and API entries reporting zero input price. The live table names the model and score.

Are open-weight models as good as GPT-4?

Check the measured score for the task you need. Open weights describe deployment rights, not capability; the leaderboard shows whether an open model is close to the current task leader.

What hardware do I need to run free open-weight LLMs?

For 7B-8B models: a single consumer GPU (RTX 3060 or better) or Apple Silicon M2+. For 70B models: multiple A100/H100s or high-end workstation GPUs. For smaller quantized models (Q4): even 16GB RAM CPU-only runs are feasible, though slow.