Comparison of Open Source Models
Comparison and analysis of open source AI models across key performance metrics including quality, performance, inference speed, context window, parameter count & licensing details. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details relating to our methodology, see our FAQs.
Kimi K2.6.Highlights
Openness
Artificial Analysis Openness Index: Score
Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon
Open Source Progress
Progress in Open Weights vs. Proprietary Intelligence
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Reasoning models are indicated by a lightbulb icon.
Open Source Language Models Intelligence By Lab Over Time
Reasoning models are indicated by a lightbulb icon.
Open Source Models Intelligence By Size Over Time
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Reasoning models are indicated by a lightbulb icon.
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Estimate (independent evaluation forthcoming)
Reasoning models are indicated by a lightbulb icon
Intelligence Evaluations
Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA v2Updated
Agentic real-world work tasks, (Elo-500)/2000
Agentic coding & terminal use
𝜏³-BankingNew
Agentic tool use
Long context reasoning
Knowledge
1 - hallucination rate
Reasoning & knowledge
Scientific reasoning
Coding
Instruction following
Physics reasoning
Long-horizon agentic tasks
Kubernetes incident root-cause analysis
Visual reasoning
Reasoning models are indicated by a lightbulb icon.
Size
Intelligence Index By Model Size
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Estimate (independent evaluation forthcoming)
Large Models (>150B)
Medium Models (40B-150B)
Small Models (4B-40B)
Reasoning models are indicated by a lightbulb icon.
Model Size: Total and Active Parameters
Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon
Intelligence vs. Active Parameters
Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Reasoning models are indicated by a lightbulb icon.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
DeepSeek
Google
Kimi
MBZUAI Institute of Foundation Models
MiniMax
Mistral
NVIDIA
OpenAI
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.
Context Window
Context Window
Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon
Further details
Weights | Provider Benchmarks | ||||||||
|---|---|---|---|---|---|---|---|---|---|
GLM-5.2 (max) | 51 | 753B 40B active at inference time | 1.00M | $0.9 | 105 | +6 | |||
MiniMax-M3 | 44 | 428B 23B active at inference time | 1.00M | $0.2 | 63 | +4 | |||
DeepSeek V4 Pro (Reasoning, Max Effort) | 44 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 69 | +8 | |||
Kimi K2.6 | 43 | 1.0KB 32B active at inference time | 256k | $0.7 | 44 | +12 | |||
MiMo-V2.5-Pro | 42 | 1.0KB 42B active at inference time | 1.00M | $0.2 | 38 | ||||
Kimi K2.7 Code | 42 | 1.0KB 32B active at inference time | 256k | $0.7 | 52 | +5 | |||
DeepSeek V4 Pro (Reasoning, High Effort) | 41 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 60 | +8 | |||
DeepSeek V4 Flash (Reasoning, Max Effort) | 40 | 284B 13B active at inference time | 1.00M | $0.1 | 92 | +4 | |||
GLM-5.1 (Reasoning) | 40 | 744B 40B active at inference time | 200k | $0.9 | 68 | +9 | |||
MiMo-V2.5 | 40 | 310B 15B active at inference time | 1.00M | $0.1 | 77 | +2 | |||
GLM-5 (Reasoning) | 40 | 744B 40B active at inference time | 200k | $0.7 | 75 | +9 | |||
MiniMax-M2.7 | 38 | 230B 10B active at inference time | 205k | $0.2 | 44 | +3 | |||
Kimi K2.5 (Reasoning) | 38 | 1.0KB 32B active at inference time | 256k | $0.6 | 52 | +12 | |||
Nemotron 3 Ultra 550B A55B (Reasoning) | 38 | 550B 55B active at inference time | 262k | $0.6 | 170 | Not available | +5 | ||
DeepSeek V4 Flash (Reasoning, High Effort) | 37 | 284B 13B active at inference time | 1.00M | $0.1 | - | +5 | |||
Qwen3.6 27B (Reasoning) | 37 | 27.8B | 262k | $0.9 | 55 | +2 | |||
GLM-5.1 (Non-reasoning) | 35 | 744B 40B active at inference time | 200k | $0.9 | 54 | +5 | |||
Kimi K2.6 (Non-reasoning) | 35 | 1.0KB 32B active at inference time | 256k | $0.7 | 44 | +9 | |||
GLM-4.7 (Reasoning) | 34 | 357B 32B active at inference time | 200k | $0.7 | 110 | +7 | |||
Qwen3.5 27B (Reasoning) | 34 | 27.8B | 262k | $0.5 | 79 | +3 | |||
Qwen3.5 397B A17B (Reasoning) | 34 | 397B 17B active at inference time | 262k | $0.9 | 51 | +9 | |||
MiniMax-M2.5 | 34 | 230B 10B active at inference time | 205k | $0.3 | 183 | +13 | |||
Hy3-preview (Reasoning) | 34 | 295B 21B active at inference time | 256k | $0.1 | 124 | ||||
DeepSeek V3.2 (Reasoning) | 33 | 685B 37B active at inference time | 128k | $0.2 | - | ? +12 | |||
MiMo-V2-Flash (Feb 2026) | 33 | 309B 15B active at inference time | 256k | $0.1 | 156 | ||||
Kimi K2 Thinking | 33 | 1.0KB 32B active at inference time | 256k | $0.8 | 120 | +3 | |||
GLM-5 (Non-reasoning) | 32 | 744B 40B active at inference time | 200k | $0.7 | 63 | +3 | |||
Qwen3.5 122B A10B (Reasoning) | 32 | 125B 10B active at inference time | 262k | $0.7 | 137 | +2 | |||
Qwen3.5 397B A17B (Non-reasoning) | 32 | 397B 17B active at inference time | 262k | $0.9 | 52 | +6 | |||
Qwen3.6 35B A3B (Reasoning) | 32 | 36B 3B active at inference time | 262k | $0.4 | 170 | +6 | |||
MiniMax-M2.1 | 31 | 230B 10B active at inference time | 205k | $0.4 | 201 | ||||
DeepSeek V4 Pro (Non-reasoning) | 31 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 74 | +2 | |||
MiMo-V2-Flash (Reasoning) | 31 | 309B 15B active at inference time | 256k | $0.1 | 155 | ||||
Ring-2.6-1T | 31 | 1.0KB 63B active at inference time | 262k | $0.5 | 131 | ||||
Mistral Medium 3.5 | 30 | 128B | 256k | $1.2 | 77 | ||||
Step 3.7 Flash | 30 | 198B 11B active at inference time | 256k | $0.2 | 360 | ||||
Kimi K2.5 (Non-reasoning) | 29 | 1.0KB 32B active at inference time | 256k | $0.8 | 53 | +6 | |||
Gemma 4 31B (Reasoning) | 29 | 30.7B | 256k | - | 34 | +8 | |||
Qwen3.5 27B (Non-reasoning) | 29 | 27.8B | 262k | $0.5 | 89 | ||||
Command A+ | 29 | 218B 25B active at inference time | 192k | - | 194 | ||||
Qwen3.6 27B (Non-reasoning) | 29 | 27.8B | 262k | $0.9 | 57 | ||||
Qwen3.5 35B A3B (Reasoning) | 29 | 36B 3B active at inference time | 262k | $0.4 | 155 | +2 | |||
DeepSeek V4 Flash (Non-reasoning) | 29 | 284B 13B active at inference time | 1.00M | $0.1 | 99 | ||||
MiniMax-M2 | 28 | 230B 10B active at inference time | 205k | $0.4 | 106 | ||||
Qwen3.5 122B A10B (Non-reasoning) | 28 | 125B 10B active at inference time | 262k | $0.7 | 163 | ||||
MiMo-V2.5-Pro (Non-reasoning) | 28 | 1.0KB 41.7B active at inference time | 1.00M | $0.6 | 44 | ||||
GLM-4.7 (Non-reasoning) | 27 | 357B 32B active at inference time | 200k | $0.7 | 110 | +6 | |||
DeepSeek V3.1 Terminus (Reasoning) | 26 | 685B 37B active at inference time | 128k | $1.7 | - | ||||
Hy3-preview (Non-reasoning) | 26 | 295B 21B active at inference time | 256k | $0.1 | 132 | ||||
Ling-2.6-1T | 26 | 1.0KB 63B active at inference time | 262k | $0.5 | - | ||||
Gemma 4 26B A4B (Reasoning) | 26 | 25.2B 3.8B active at inference time | 256k | $0.1 | - | +4 | |||
Step 3.5 Flash | 26 | 196B 11B active at inference time | 256k | $0.1 | 211 | ||||
DeepSeek V3.2 Exp (Reasoning) | 25 | 685B 37B active at inference time | 128k | $0.2 | - | ||||
NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 25 | 120.6B 12.7B active at inference time | 1.00M | $0.3 | 149 | +2 | |||
GLM-4.6 (Reasoning) | 25 | 357B 32B active at inference time | 200k | $0.7 | 43 | ||||
Qwen3.5 9B (Reasoning) | 25 | 9.65B | 262k | $0.1 | 61 | ||||
Gemma 4 31B (Non-reasoning) | 25 | 30.7B | 256k | $0.2 | 35 | +4 | |||
K-EXAONE (Reasoning) | 25 | 236B 23B active at inference time | 256k | - | - | - | |||
DeepSeek V3.2 (Non-reasoning) | 25 | 685B 37B active at inference time | 128k | $0.5 | - | +12 | |||
Trinity Large Thinking | 24 | 399B 13B active at inference time | 512k | $0.2 | 182 | ||||
Qwen3.6 35B A3B (Non-reasoning) | 24 | 36B 3B active at inference time | 262k | $0.6 | 183 | +5 | |||
gpt-oss-120b (high) | 24 | 117B 5.1B active at inference time | 131k | $0.2 | 338 | +23 | |||
Kimi K2 0905 | 24 | 1.0KB 32B active at inference time | 256k | $0.8 | 26 | ||||
Qwen3.5 35B A3B (Non-reasoning) | 23 | 36B 3B active at inference time | 262k | $0.4 | 179 | ||||
MiMo-V2-Flash (Non-reasoning) | 23 | 309B 15B active at inference time | 256k | $0.1 | 150 | ||||
GLM-4.6 (Non-reasoning) | 23 | 357B 32B active at inference time | 200k | $0.8 | 43 | ||||
EXAONE 4.5 33B | 23 | 34.4B | 262k | - | - | - | |||
GLM-4.7-Flash (Reasoning) | 23 | 31.2B 3B active at inference time | 200k | $0.1 | 86 | ||||
Qwen3 235B A22B 2507 (Reasoning) | 22 | 235B 22B active at inference time | 256k | $0.6 | 47 | +3 | |||
DeepSeek V3.2 Speciale | 22 | 685B 37B active at inference time | 128k | - | - | - | |||
HyperNova 60B 2605 | 22 | 58.7B 4.8B active at inference time | 131k | $0.1 | 342 | ||||
Gemma 4 12B (Reasoning) | 22 | 12B | 256k | $0.1 | 121 | ||||
DeepSeek V3.1 Terminus (Non-reasoning) | 21 | 685B 37B active at inference time | 128k | $0.3 | - | ||||
DeepSeek V3.2 Exp (Non-reasoning) | 21 | 685B 37B active at inference time | 128k | $0.2 | - | ||||
Nemotron Cascade 2 30B A3B | 21 | 31.6B 3B active at inference time | 1.00M | - | - | - | |||
Apriel-v1.5-15B-Thinker | 21 | 15B | 128k | - | - | ||||
Qwen3 Coder Next | 21 | 79.7B 3B active at inference time | 256k | $0.4 | 73 | ||||
DeepSeek V3.1 (Non-reasoning) | 21 | 685B 37B active at inference time | 128k | $0.7 | - | +7 | |||
Mistral Small 4 (Reasoning) | 21 | 119B 6.5B active at inference time | 256k | $0.2 | 166 | ||||
DeepSeek V3.1 (Reasoning) | 21 | 685B 37B active at inference time | 128k | $0.7 | - | ||||
Qwen3 VL 235B A22B (Reasoning) | 21 | 235B 22B active at inference time | 262k | $1.4 | 51 | ||||
North Mini Code | 21 | 30B 3B active at inference time | 256k | - | 174 | Not available | |||
Apriel-v1.6-15B-Thinker | 21 | 15B | 128k | - | - | ||||
Qwen3.5 9B (Non-reasoning) | 20 | 9.65B | 262k | - | - | - | |||
Gemma 4 26B A4B (Non-reasoning) | 20 | 25.2B 3.8B active at inference time | 256k | $0.2 | 42 | +4 | |||
Qwen3.5 4B (Reasoning) | 20 | 4.66B | 262k | $0.0 | 27 | ||||
DeepSeek R1 0528 (May '25) | 20 | 685B 37B active at inference time | 128k | $1.6 | - | +3 | |||
Qwen3 Next 80B A3B (Reasoning) | 20 | 80B 3B active at inference time | 262k | $1.1 | 170 | +5 | |||
GLM-4.5 (Reasoning) | 19 | 355B 32B active at inference time | 128k | $0.8 | 58 | ||||
Kimi K2 | 19 | 1.0KB 32B active at inference time | 128k | $0.6 | 25 | ||||
Ling 2.6 Flash | 19 | 107B 7.4B active at inference time | 262k | $0.1 | - | ||||
Seed-OSS-36B-Instruct | 18 | 36.2B | 512k | $0.2 | 35 | ||||
Qwen3 235B A22B 2507 Instruct | 18 | 235B 22B active at inference time | 256k | $0.3 | 57 | +9 | |||
Qwen3 Coder 480B A35B Instruct | 18 | 480B 35B active at inference time | 262k | $0.5 | 55 | +6 | |||
Qwen3 VL 32B (Reasoning) | 18 | 33.4B | 256k | $1.5 | 90 | ||||
gpt-oss-120b (low) | 18 | 117B 5.1B active at inference time | 131k | $0.2 | 352 | +19 | |||
MiniMax M1 80k | 18 | 456B 45.9B active at inference time | 1.00M | $0.7 | - | ||||
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 18 | 31.6B 3.6B active at inference time | 1.00M | $0.1 | 50 | ||||
K2 Think V2 | 17 | 70B | 262k | - | - | - | |||
LongCat Flash Lite | 17 | 68.5B 3B active at inference time | 256k | - | - | ||||
HyperCLOVA X SEED Think (32B) | 17 | 32B | 128k | - | - | - | |||
GLM-4.6V (Reasoning) | 17 | 108B 12B active at inference time | 128k | $0.4 | 88 | ||||
K-EXAONE (Non-reasoning) | 17 | 236B 23B active at inference time | 256k | - | - | - | |||
GLM-4.5-Air | 17 | 106B 12B active at inference time | 128k | $0.3 | 80 | ||||
Mistral Large 3 | 16 | 675B 41B active at inference time | 256k | $0.6 | 50 | ||||
Ring-1T | 16 | 1.0KB 50B active at inference time | 128k | - | - | - | |||
Qwen3.5 4B (Non-reasoning) | 16 | 4.66B | 262k | $0.0 | 23 | ||||
Qwen3 30B A3B 2507 (Reasoning) | 16 | 30.5B 3.3B active at inference time | 262k | $0.4 | 129 | ||||
DeepSeek V3 0324 | 16 | 671B 37B active at inference time | 128k | $1.2 | - | +3 | |||
INTELLECT-3 | 16 | 107B 12B active at inference time | 131k | - | - | - | |||
GLM-4.7-Flash (Non-reasoning) | 16 | 31.2B 3B active at inference time | 200k | $0.1 | 144 | ||||
Devstral 2 | 15 | 125B | 256k | - | 47 | ||||
Solar Open 100B (Reasoning) | 15 | 102B 12B active at inference time | 128k | - | - | - | |||
Nemotron 3 Nano Omni 30B A3B Reasoning | 15 | 30B 3B active at inference time | 256k | $0.1 | 289 | ||||
gpt-oss-20B (high) | 15 | 21B 3.6B active at inference time | 131k | $0.1 | 208 | +10 | |||
MiniMax M1 40k | 14 | 456B 45.9B active at inference time | 1.00M | - | - | - | |||
gpt-oss-20B (low) | 14 | 21B 3.6B active at inference time | 131k | $0.1 | 219 | +9 | |||
Qwen3 VL 235B A22B Instruct | 14 | 235B 22B active at inference time | 262k | $0.5 | 50 | +2 | |||
Llama 4 Maverick | 14 | 402B 17B active at inference time | 1.00M | $0.3 | 93 | +6 | |||
K2-V2 (high) | 14 | 70B | 512k | - | - | - | |||
Qwen3 Next 80B A3B Instruct | 14 | 80B 3B active at inference time | 262k | $0.7 | 173 | +4 | |||
Tri-21B-think Preview | 14 | 21B | 32.0k | - | - | - | |||
Qwen3 Coder 30B A3B Instruct | 14 | 30.5B 3.3B active at inference time | 262k | $0.3 | 102 | ||||
Qwen3 235B A22B (Reasoning) | 13 | 235B 22B active at inference time | 32.8k | $1.5 | 56 | ||||
QwQ 32B | 13 | 32.8B | 131k | $0.7 | 30 | ||||
Qwen3 VL 30B A3B (Reasoning) | 13 | 30B 3B active at inference time | 256k | $0.3 | 112 | ||||
Gemma 4 12B (Non-reasoning) | 13 | 12B | 262k | - | - | - | |||
Devstral Small 2 | 13 | 24B | 256k | - | 45 | ||||
Ling-1T | 13 | 1.0KB 50B active at inference time | 128k | - | - | - | |||
DeepSeek R1 (Jan '25) | 13 | 685B 37B active at inference time | 128k | $2.0 | - | +3 | |||
Gemma 4 E4B (Reasoning) | 12 | 8B 4.5B active at inference time | 128k | - | - | - | |||
K2-V2 (medium) | 12 | 70B | 512k | - | - | - | |||
Llama Nemotron Super 49B v1.5 (Reasoning) | 12 | 49B | 128k | $0.1 | 48 | ||||
Mistral Small 4 (Non-reasoning) | 12 | 119B 6.5B active at inference time | 256k | $0.2 | 151 | ||||
Tri-21B-Think | 12 | 21B | 32.0k | - | - | - | |||
Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 12 | 49B | 128k | - | - | - | |||
Qwen3 4B 2507 (Reasoning) | 12 | 4.02B | 262k | - | - | - | |||
MiniCPM5-1B (Reasoning) | 12 | 1B | 128k | - | - | - | |||
Magistral Small 1.2 | 12 | 24B | 128k | $0.6 | 107 | ||||
Sarvam 105B (high) | 12 | 106B 10.3B active at inference time | 128k | $0.0 | 108 | ||||
Devstral Small (May '25) | 12 | 23.6B | 256k | - | - | - | |||
MiniCPM5-1B (Non-reasoning) | 12 | 1B | 128k | - | - | - | |||
Qwen3 VL 32B Instruct | 11 | 33.4B | 256k | $0.9 | 67 | ||||
DeepSeek R1 Distill Qwen 32B | 11 | 32B | 128k | - | - | - | |||
GLM-4.6V (Non-reasoning) | 11 | 108B 12B active at inference time | 128k | $0.4 | 83 | ||||
Qwen3 235B A22B (Non-reasoning) | 11 | 235B 22B active at inference time | 32.8k | $0.6 | 57 | ||||
Magistral Small 1 | 11 | 23.6B | 40.0k | - | - | - | |||
EXAONE 4.0 32B (Reasoning) | 11 | 32B | 131k | - | - | - | |||
Qwen3 VL 8B (Reasoning) | 11 | 8.77B | 256k | $0.4 | 110 | ||||
Qwen3 32B (Reasoning) | 10 | 32.8B | 32.8k | $0.2 | 76 | +3 | |||
DeepSeek V3 (Dec '24) | 10 | 671B 37B active at inference time | 128k | $0.4 | - | +2 | |||
DeepSeek R1 0528 Qwen3 8B | 10 | 8.19B | 32.8k | - | - | - | |||
Qwen3.5 2B (Reasoning) | 10 | 2.27B | 262k | $0.0 | 24 | ||||
Qwen3 14B (Reasoning) | 10 | 14.8B | 32.8k | $0.4 | 63 | ||||
Nanbeige4.1-3B | 10 | 3.93B | 256k | - | - | - | |||
Llama 4 Scout | 10 | 109B 17B active at inference time | 10.0M | $0.2 | 106 | +6 | |||
Qwen3 VL 30B A3B Instruct | 10 | 30B 3B active at inference time | 256k | $0.2 | 113 | ||||
Hermes 4 - Llama-3.1 70B (Reasoning) | 10 | 70.6B | 128k | $0.2 | 70 | ||||
Ministral 3 14B | 10 | 14B | 256k | $0.2 | 90 | ||||
DeepSeek R1 Distill Llama 70B | 10 | 70B | 128k | $0.7 | 47 | ||||
DeepSeek R1 Distill Qwen 14B | 10 | 14B | 128k | - | - | - | |||
Falcon-H1R-7B | 10 | 7B | 256k | - | - | - | |||
Ling-flash-2.0 | 10 | 103B 6.1B active at inference time | 128k | $0.2 | 51 | ||||
Qwen3 Omni 30B A3B (Reasoning) | 10 | 35.3B 3B active at inference time | 65.5k | $0.3 | 88 | ||||
Qwen2.5 Instruct 72B | 10 | 72B | 131k | $0.2 | - | ||||
Step3 VL 10B | 9 | 10.2B | 65.5k | - | - | - | |||
Qwen3 30B A3B (Reasoning) | 9 | 30.5B 3.3B active at inference time | 32.8k | $0.1 | 108 | +2 | |||
Devstral Small (Jul '25) | 9 | 24B | 256k | $0.1 | 31 | ||||
Gemma 4 E2B (Reasoning) | 9 | 5.1B 2.3B active at inference time | 128k | - | - | - | |||
QwQ 32B-Preview | 9 | 32.8B | 32.8k | - | - | - | |||
GLM-4.5V (Reasoning) | 9 | 108B 12B active at inference time | 64.0k | $0.7 | 25 | ||||
Mistral Large 2 (Nov '24) | 9 | 123B | 128k | $2.4 | 54 | ||||
Mistral Small 3.2 | 9 | 24B | 128k | $0.1 | 128 | ||||
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 9 | 253B | 128k | $0.7 | 51 | ||||
Qwen3 30B A3B 2507 Instruct | 9 | 30.5B 3.3B active at inference time | 262k | $0.2 | 148 | ||||
ERNIE 4.5 300B A47B | 9 | 300B 47B active at inference time | 131k | $0.4 | - | ||||
Hermes 4 - Llama-3.1 405B (Reasoning) | 9 | 406B | 128k | $1.2 | 37 | ||||
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 9 | 13.2B | 128k | $0.2 | 283 | ||||
Ministral 3 8B | 9 | 8B | 256k | $0.1 | 87 | ||||
Gemma 4 E4B (Non-reasoning) | 9 | 8B 4.5B active at inference time | 128k | - | - | - | |||
Granite 4.1 30B | 9 | 30B | 131k | - | - | - | |||
NVIDIA Nemotron Nano 9B V2 (Reasoning) | 9 | 9B | 131k | $0.1 | 61 | ||||
Hermes 4 - Llama-3.1 405B (Non-reasoning) | 9 | 406B | 128k | $1.2 | 39 | ||||
NVIDIA Nemotron 3 Nano 4B | 9 | 3.97B | 262k | - | - | - | |||
Qwen3.5 2B (Non-reasoning) | 9 | 2.27B | 262k | $0.0 | 26 | ||||
Llama Nemotron Super 49B v1.5 (Non-reasoning) | 9 | 49B | 128k | $0.1 | 48 | ||||
Qwen3 32B (Non-reasoning) | 9 | 32.8B | 32.8k | $0.2 | 67 | +4 | |||
Llama 3.3 Instruct 70B | 9 | 70B | 128k | $0.6 | 91 | +18 | |||
Mistral Small 3.1 | 9 | 24B | 128k | $0.1 | 153 | ||||
K2-V2 (low) | 9 | 70B | 512k | - | - | - | |||
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 9 | 4.51B | 128k | - | - | - | |||
Kimi Linear 48B A3B Instruct | 9 | 49.1B 3B active at inference time | 1.00M | - | - | - | |||
Llama 3.1 Instruct 405B | 9 | 405B | 128k | $3.1 | 48 | ||||
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 8 | 49B | 128k | - | - | - | |||
Qwen3 VL 8B Instruct | 8 | 8.77B | 256k | $0.2 | 120 | ||||
Qwen3 4B (Reasoning) | 8 | 4.02B | 32.0k | $0.2 | - | ||||
Llama 3.1 Tulu3 405B | 8 | 405B | 128k | - | - | - | |||
Ring-flash-2.0 | 8 | 103B 6.1B active at inference time | 128k | $0.2 | - | ||||
Pixtral Large | 8 | 124B | 128k | $2.4 | 50 | ||||
Olmo 3.1 32B Think | 8 | 32.2B | 65.5k | - | - | ||||
Grok 2 (Dec '24) | 8 | 270B | 131k | - | - | - | |||
Qwen3 VL 4B (Reasoning) | 8 | 4.44B | 256k | - | - | - | |||
Command A | 8 | 111B | 256k | $3.3 | 71 | ||||
Llama 3.1 Nemotron Instruct 70B | 8 | 70B | 128k | $1.2 | 295 | ||||
Qwen2.5 Instruct 32B | 7 | 32B | 128k | - | - | - | |||
Qwen3 8B (Reasoning) | 7 | 8.19B | 131k | $0.2 | 38 | ||||
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | 7 | 31.6B 3.6B active at inference time | 1.00M | $0.1 | 61 | ||||
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 7 | 9B | 131k | $0.1 | 129 | ||||
Mistral Large 2 (Jul '24) | 7 | 123B | 128k | $2.4 | - | ||||
Qwen3 4B 2507 Instruct | 7 | 4.02B | 262k | - | - | - | |||
Qwen2.5 Coder Instruct 32B | 7 | 32B | 131k | - | - | - | |||
Qwen3 14B (Non-reasoning) | 7 | 14.8B | 32.8k | $0.3 | 63 | ||||
GLM-4.5V (Non-reasoning) | 7 | 108B 12B active at inference time | 64.0k | $0.7 | 19 | ||||
Mistral Small 3 | 7 | 24B | 32.0k | $0.1 | 157 | ||||
MiniCPM-V 4.6 1.3B | 7 | 1.3B | 262k | - | - | - | |||
Hermes 4 - Llama-3.1 70B (Non-reasoning) | 7 | 70.6B | 128k | $0.2 | 72 | ||||
Qwen3 30B A3B (Non-reasoning) | 7 | 30.5B 3.3B active at inference time | 32.8k | $0.1 | 107 | ||||
DeepSeek-V2.5 (Dec '24) | 7 | 236B 21B active at inference time | 128k | - | - | - | |||
Qwen3 4B (Non-reasoning) | 7 | 4.02B | 32.0k | $0.1 | - | ||||
Llama 3.1 Instruct 70B | 7 | 70B | 128k | $0.6 | 30 | ||||
Granite 4.1 8B | 7 | 8B | 131k | $0.1 | 120 | ||||
Sarvam 30B (high) | 7 | 32.2B 2.4B active at inference time | 65.5k | $0.0 | 166 | ||||
DeepSeek-V2.5 | 7 | 236B 21B active at inference time | 128k | - | - | - | |||
Olmo 3.1 32B Instruct | 6 | 32.2B | 65.5k | - | - | - | |||
DeepSeek R1 Distill Llama 8B | 6 | 8B | 128k | - | - | - | |||
Gemma 4 E2B (Non-reasoning) | 6 | 5.1B 2.3B active at inference time | 128k | - | - | - | |||
Olmo 3 32B Think | 6 | 32.2B | 65.5k | - | - | - | |||
R1 1776 | 6 | 671B 37B active at inference time | 128k | - | - | - | |||
Llama 3.2 Instruct 90B (Vision) | 6 | 90B | 128k | $1.4 | 57 | ||||
Solar Mini | 6 | 10.7B | 4.10k | $0.1 | - | ||||
Llama 3.1 Instruct 8B | 6 | 8B | 128k | $0.1 | 154 | +12 | |||
Grok-1 | 6 | 314B 78B active at inference time | 8.19k | - | - | - | |||
Qwen2 Instruct 72B | 6 | 72B | 131k | - | - | - | |||
EXAONE 4.0 32B (Non-reasoning) | 6 | 32B | 131k | - | - | - | |||
Ministral 3 3B | 6 | 3B | 256k | $0.1 | 184 | ||||
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | 5 | 24B | 32.0k | - | - | - | |||
Jamba 1.7 Large | 5 | 398B 94B active at inference time | 256k | $2.6 | 60 | ||||
Granite 4.0 H Small | 5 | 32B 9B active at inference time | 128k | $0.1 | 393 | ||||
Jamba 1.5 Large | 5 | 398B 94B active at inference time | 256k | $2.6 | - | ||||
Qwen3 Omni 30B A3B Instruct | 5 | 35.3B 3B active at inference time | 65.5k | $0.3 | 95 | ||||
Hermes 3 - Llama-3.1 70B | 5 | 70.6B | 128k | $0.3 | 28 | ||||
Qwen3 8B (Non-reasoning) | 5 | 8.19B | 32.8k | $0.2 | 39 | ||||
DeepSeek-Coder-V2 | 5 | 236B 21B active at inference time | 128k | - | - | - | |||
OLMo 2 32B | 5 | 32.2B | 4.10k | - | - | - | |||
Jamba 1.6 Large | 5 | 398B 94B active at inference time | 256k | $2.6 | 60 | ||||
Qwen3.5 0.8B (Reasoning) | 5 | 0.873B | 262k | $0.0 | 30 | ||||
LFM2 24B A2B | 5 | 23.8B 2.3B active at inference time | 32.8k | $0.0 | 116 | ||||
Phi-4 | 5 | 14B | 16.0k | $0.2 | 36 | ||||
Gemma 3 27B Instruct | 5 | 27.4B | 128k | $0.1 | - | +3 | |||
Mistral Small (Sep '24) | 5 | 22B | 32.8k | $0.2 | 159 | ||||
Phi-3 Mini Instruct 3.8B | 5 | 3.8B | 4.10k | - | - | - | |||
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 5 | 13.2B | 128k | $0.2 | 212 | ||||
Gemma 3n E4B Instruct Preview (May '25) | 5 | 8.39B 4B active at inference time | 32.0k | - | - | - | |||
Phi-4 Multimodal Instruct | 5 | 5.6B | 128k | - | 15 | ||||
Qwen2.5 Coder Instruct 7B | 4 | 7.62B | 131k | - | - | - | |||
Qwen3.5 0.8B (Non-reasoning) | 4 | 0.873B | 262k | $0.0 | 22 | ||||
Mixtral 8x22B Instruct | 4 | 141B 39B active at inference time | 65.4k | - | - | - | |||
Llama 2 Chat 7B | 4 | 7B | 4.10k | $0.1 | - | ||||
Llama 3.2 Instruct 3B | 4 | 3B | 128k | $0.1 | 52 | ||||
Jamba Reasoning 3B | 4 | 3B | 262k | - | - | - | |||
Qwen3 VL 4B Instruct | 4 | 4.44B | 256k | - | - | - | |||
Qwen1.5 Chat 110B | 4 | 110B | 32.0k | - | - | - | |||
Reka Flash 3 | 4 | 21B | 128k | $0.3 | - | ||||
Olmo 3 7B Think | 4 | 7B | 65.5k | - | - | - | |||
OLMo 2 7B | 4 | 7.3B | 4.10k | - | - | - | |||
Molmo 7B-D | 4 | 8.02B | 4.10k | - | - | - | |||
Ling-mini-2.0 | 4 | 16.3B 1.4B active at inference time | 131k | - | - | - | |||
DeepSeek R1 Distill Qwen 1.5B | 4 | 1.5B | 128k | - | - | - | |||
DeepSeek-V2-Chat | 4 | 236B 21B active at inference time | 128k | - | - | - | |||
Llama 3 Instruct 70B | 3 | 70B | 8.19k | $0.9 | - | ||||
Arctic Instruct | 3 | 480B 17B active at inference time | 4.00k | - | - | - | |||
Qwen Chat 72B | 3 | 72B | 33.8k | - | - | - | |||
Gemma 3 12B Instruct | 3 | 12.2B | 128k | $0.1 | - | +2 | |||
Llama 3.2 Instruct 11B (Vision) | 3 | 11B | 128k | $0.2 | 50 | ||||
Granite 4.1 3B | 3 | 3B | 131k | - | - | - | |||
DeepSeek Coder V2 Lite Instruct | 3 | 16B 2.4B active at inference time | 128k | - | - | - | |||
Sarvam M (Reasoning) | 3 | 23.6B | 32.8k | - | - | ||||
Phi-4 Mini Instruct | 3 | 3.84B | 128k | - | 43 | ||||
Llama 2 Chat 70B | 3 | 70B | 4.10k | - | - | - | |||
DeepSeek LLM 67B Chat (V1) | 3 | 7B | 4.10k | - | - | - | |||
Llama 2 Chat 13B | 3 | 13B | 4.10k | - | - | - | |||
Command-R+ (Apr '24) | 3 | 104B | 128k | $4.2 | - | ||||
OpenChat 3.5 (1210) | 3 | 7B | 8.19k | - | - | - | |||
DBRX Instruct | 3 | 132B 36B active at inference time | 32.8k | - | - | - | |||
Exaone 4.0 1.2B (Reasoning) | 3 | 1.28B | 64.0k | - | - | - | |||
Olmo 3 7B Instruct | 3 | 7B | 65.5k | $0.1 | - | ||||
Exaone 4.0 1.2B (Non-reasoning) | 3 | 1.28B | 64.0k | - | - | - | |||
LFM2.5-1.2B-Thinking | 3 | 1.17B | 32.0k | - | - | - | |||
Jamba 1.7 Mini | 3 | 52B 12B active at inference time | 258k | - | - | - | |||
LFM2 2.6B | 3 | 2.57B | 32.8k | - | 339 | ||||
LFM2.5-1.2B-Instruct | 3 | 1.17B | 32.0k | - | 492 | ||||
Jamba 1.5 Mini | 3 | 52B 12B active at inference time | 256k | $0.2 | - | ||||
Granite 4.0 H 1B | 3 | 1.5B | 128k | - | - | - | |||
Qwen3 1.7B (Reasoning) | 3 | 2.03B | 32.0k | $0.2 | - | ||||
Jamba 1.6 Mini | 3 | 52B 12B active at inference time | 256k | $0.2 | 181 | ||||
Mixtral 8x7B Instruct | 2 | 46.7B 12.9B active at inference time | 32.8k | $0.5 | - | ||||
Gemma 3 270M | 2 | 0.268B | 32.0k | - | - | - | |||
Apertus 70B Instruct | 2 | 70B | 65.5k | $1.0 | - | ||||
Granite 4.0 Micro | 2 | 3B | 128k | - | - | - | |||
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | 2 | 8B | 128k | - | - | - | |||
Llama 65B | 2 | 65B | 2.05k | - | - | - | |||
Qwen Chat 14B | 2 | 14B | 8.19k | - | - | - | |||
Mistral 7B Instruct | 2 | 7B | 8.19k | $0.2 | 104 | ||||
Command-R (Mar '24) | 2 | 35B | 128k | $0.6 | - | ||||
Granite 4.0 1B | 2 | 1.6B | 128k | - | - | - | |||
Molmo2-8B | 2 | 8.66B | 36.9k | - | - | - | |||
LFM2 8B A1B | 2 | 8.34B 1.5B active at inference time | 32.8k | - | - | ||||
Granite 3.3 8B (Non-reasoning) | 2 | 8.17B | 128k | $0.1 | 328 | ||||
Qwen3 1.7B (Non-reasoning) | 2 | 2.03B | 32.0k | $0.1 | - | ||||
Qwen3 0.6B (Reasoning) | 1 | 0.752B | 32.0k | $0.2 | - | ||||
Llama 3 Instruct 8B | 1 | 8B | 8.19k | $0.1 | - | ||||
Gemma 3n E4B Instruct | 1 | 8.39B 4B active at inference time | 32.0k | $0.0 | 50 | ||||
LFM2 1.2B | 1 | 1.17B | 32.8k | - | 476 | ||||
Gemma 3 4B Instruct | 1 | 4.3B | 128k | $0.0 | - | ||||
Llama 3.2 Instruct 1B | 1 | 1B | 128k | $0.1 | 84 | ||||
LFM2.5-VL-1.6B | 1 | 1.6B | 32.0k | - | 493 | ||||
Granite 4.0 350M | 1 | 0.35B | 32.8k | - | - | - | |||
Granite 4.0 H 350M | 1 | 0.34B | 32.8k | - | - | - | |||
Apertus 8B Instruct | 1 | 8B | 65.5k | $0.1 | - | ||||
Tiny Aya Global | 1 | 3.35B | 8.19k | - | - | ||||
Gemma 3n E2B Instruct | 1 | 5.98B 2B active at inference time | 32.0k | - | - | ||||
Gemma 3 1B Instruct | 1 | 1B | 32.0k | - | - | ||||
Qwen3 0.6B (Non-reasoning) | 1 | 0.752B | 32.0k | $0.1 | - | ||||
EXAONE 4.5 33B (Non-reasoning) | - | 34.4B | 262k | - | - | - | |||
Cogito v2.1 (Reasoning) | - | 671B 37B active at inference time | 128k | $1.3 | 91 |