AI LLM Leaderboard
Compare the latest Large Language Models across multiple benchmarks and performance metrics
Total Models
35+
Latest models included
Benchmarks
8
Comprehensive tests
Top Score
97.3%
DeepSeek R1 (MATH)
Fastest Model
189
TPS (o3-mini)
Filters & Search
Model Rankings - 40 Models
# | Organization | Category | MMLU | GPQA | MMMU | HellaSwag | HumanEval | BBHard | GSM8K | MATH | Cost/1K | TPS | Context | Trend | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | o1 2024-09 Top MMLUPremium | OpenAI | reasoning | 88.4% | 92.3% | 78% | N/A | N/A | N/A | N/A | N/A | 94.8% | $0.06 | 15 | 128K | |
2 | DeepSeek R1 2025-01 Best MATHBest Reasoning | DeepSeek | reasoning | 86.5% | 90.8% | 71.5% | N/A | N/A | N/A | N/A | N/A | 97.3% | $0.008 | 65 | 64K | |
3 | o3-mini 2024-12 Fastest TPSBest HumanEval | OpenAI | coding | 86% | 86% | 75% | N/A | N/A | 97% | N/A | N/A | N/A | $0.02 | 189 | 128K | |
4 | Claude 3.7 Sonnet 2024-11 Best MATH | Anthropic | reasoning | 85.5% | 86.1% | 84.8% | 75% | N/A | N/A | N/A | N/A | 96.2% | $0.02 | 65 | 200K | |
5 | o4-mini 2024-12 Best MATH | OpenAI | reasoning | 85.2% | N/A | 81.4% | 81.6% | N/A | N/A | N/A | N/A | 92.7% | $0.02 | 85 | 128K | |
6 | Gemini 2.5 Pro 2024-12 LatestHigh GPQA | multimodal | 85.2% | 89.8% | 84% | 81.7% | N/A | N/A | N/A | N/A | N/A | $0.02 | 55 | 1M | ||
7 | o3 2024-12 Top MATH | OpenAI | reasoning | 85% | N/A | 83.3% | 82.9% | N/A | N/A | N/A | N/A | 88.9% | $0.06 | 18 | 128K | |
8 | o1-preview 2024-09 Preview | OpenAI | reasoning | 84.9% | 90.8% | 78.3% | N/A | N/A | N/A | N/A | N/A | 85.5% | $0.045 | 20 | 128K | |
9 | Claude 3.5 Sonnet 2024-06 User's ChoiceBest GSM8K | Anthropic | reasoning | 82.3% | 88.7% | 59.4% | 68.3% | 89% | 92% | 93.1% | 96.4% | 71.1% | $0.015 | 170 | 200K | |
10 | GPT-4o 2024-05 Least LatencyMultimodal | OpenAI | multimodal | 82.2% | 88.7% | 53.6% | 69.1% | 94.2% | 90.2% | 91.3% | 89.8% | 76.6% | $0.015 | 85 | 128K | |
11 | o1-mini 2024-09 Good Coding | OpenAI | coding | 81.9% | 85.2% | 60% | N/A | N/A | 92.4% | N/A | N/A | 90% | $0.025 | 45 | 128K | |
12 | Gemini 2.0 Flash 2024-12 FastGood Performance | multimodal | 81.8% | 87% | 59% | N/A | N/A | 91% | N/A | N/A | 90% | $0.01 | 110 | 1M | ||
13 | Claude Opus 4 2024-12 LatestPremium | Anthropic | reasoning | 81% | 88.8% | 83.3% | 76.5% | N/A | N/A | N/A | N/A | 75.5% | $0.045 | 45 | 200K | |
14 | DeepSeek V3 2024-12 Open SourceGood MATH | DeepSeek | coding | 80.1% | 88.5% | 59.1% | N/A | N/A | 82.6% | N/A | N/A | 90.2% | $0.004 | 95 | 64K | |
15 | Llama 3.1 405B 2024-07 Open SourceLargest Open | Meta | reasoning | 78.9% | 88.6% | 51.1% | 64.5% | 87% | 89% | 81.3% | 96.8% | 73.8% | $0.015 | 35 | 128K | |
16 | Claude Sonnet 4 2024-12 LatestHigh GPQA | Anthropic | reasoning | 78.8% | 86.5% | 83.8% | 74.4% | N/A | N/A | N/A | N/A | 70.5% | $0.025 | 60 | 200K | |
17 | GPT-4 Turbo 2024-04 Highly PreferredBalanced | OpenAI | reasoning | 77.6% | 86.5% | 48% | 63.1% | 94.2% | 90.2% | 87.6% | 91% | 72.2% | $0.03 | 45 | 128K | |
18 | Claude 3 Opus 2024-03 PremiumComplete Benchmarks | Anthropic | reasoning | 77.2% | 86.8% | 50.4% | 59.4% | 95.4% | 84.9% | 86.8% | 95% | 60.1% | $0.045 | 45 | 200K | |
19 | GPT-4.1 2024-11 Latest GPT | OpenAI | reasoning | 77.1% | 90.2% | 66.3% | 74.8% | N/A | N/A | N/A | N/A | N/A | $0.04 | 50 | 128K | |
20 | Gemini 2.0 Pro Experimental 2024-12 ExperimentalGood MATH | multimodal | 77.1% | 79.1% | 64.7% | 72.7% | N/A | N/A | N/A | N/A | 91.8% | $0.015 | 60 | 1M | ||
21 | Claude 3.7 Sonnet (Normal) 2024-11 Balanced | Anthropic | reasoning | 76.3% | 83.2% | 68% | 71.8% | N/A | N/A | N/A | N/A | 82.2% | $0.015 | 85 | 200K | |
22 | Llama 4 Maverick 2024-12 Open Source | Meta | reasoning | 75.9% | 84.6% | 69.8% | 73.4% | N/A | N/A | N/A | N/A | N/A | $0.005 | 85 | 128K | |
23 | Llama 3.3 70B 2024-10 Open SourceGood Coding | Meta | coding | 75.5% | 86% | 50.5% | N/A | N/A | 88.4% | N/A | N/A | 77% | $0.006 | 90 | 128K | |
24 | GPT-4.1 mini 2024-11 Cost-Effective | OpenAI | reasoning | 75.1% | 87.5% | 65% | 72.7% | N/A | N/A | N/A | N/A | N/A | $0.015 | 95 | 128K | |
25 | Grok-2 2024-08 Good Coding | xAI | coding | 74.8% | 87.5% | 56% | 66.1% | N/A | 88.4% | N/A | N/A | 76.1% | $0.01 | 75 | 128K | |
26 | Grok 3 2024-12 Latest | xAI | reasoning | 74.3% | N/A | 75.4% | 73.2% | N/A | N/A | N/A | N/A | N/A | $0.012 | 70 | 128K | |
27 | Gemini 1.5 Pro 2024-02 Largest ContextComplete Benchmarks | multimodal | 73.6% | 81.9% | 46.2% | 62.2% | 92.5% | 71.9% | 84% | 91.7% | 58.5% | $0.0125 | 38 | 2M | ||
28 | Gemini 2.5 Flash Lite 2024-12 Latest | multimodal | 71.8% | 84.5% | 66.7% | 72.9% | N/A | N/A | N/A | N/A | 63.1% | $0.01 | 75 | 1M | ||
29 | GPT-4 2023-03 Most ExpensiveClassic | OpenAI | reasoning | 71.4% | 86.4% | 35.7% | 56.8% | 95.3% | 67% | 83.1% | 92% | 52.9% | $0.18 | 25 | 8K | |
30 | Llama 3.2 90B 2024-09 Open Source | Meta | reasoning | 69.6% | 86% | 46.7% | 60.3% | N/A | N/A | N/A | 86.9% | 68% | $0.008 | 80 | 128K | |
31 | Claude 3 Sonnet 2024-03 BalancedComplete Benchmarks | Anthropic | reasoning | 69.1% | 79% | 40.4% | 53.1% | 89% | 73% | 82.9% | 92.3% | 43.1% | $0.012 | 90 | 200K | |
32 | Gemini 1.5 Flash 2024-05 FastComplete Benchmarks | multimodal | 68.6% | 78.9% | 39.5% | 56.1% | 81.3% | 67.5% | 89.2% | 68.8% | 67.7% | $0.008 | 95 | 1M | ||
33 | GPT-4o mini 2024-07 Cost-Effective | OpenAI | conversation | 67.8% | 82% | 40.2% | 59.4% | N/A | 87.2% | N/A | N/A | 70.2% | $0.007 | 120 | 128K | |
34 | Llama 4 Scout 2024-12 Least ExpensiveOpen Source | Meta | conversation | 67% | 74.3% | 57.2% | 69.4% | N/A | N/A | N/A | N/A | N/A | $0.0003 | 120 | 128K | |
35 | Claude 3.5 Haiku 2024-11 FastCost-Effective | Anthropic | conversation | 66% | 65% | 41.6% | N/A | N/A | 88.1% | N/A | N/A | 69.2% | $0.005 | 140 | 200K | |
36 | Claude 3 Haiku 2024-03 FastComplete Benchmarks | Anthropic | conversation | 65.3% | 75.2% | 33.3% | 50.2% | 85.9% | 75.9% | 73.7% | 88.9% | 38.9% | $0.004 | 160 | 200K | |
37 | GPT-4.1 nano 2024-11 Ultra Fast | OpenAI | conversation | 61.9% | 80.1% | 50.3% | 55.4% | N/A | N/A | N/A | N/A | N/A | $0.005 | 150 | 32K | |
38 | o3-pro 2025-01 Upcoming | OpenAI | reasoning | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | $0.08 | 20 | 128K | |
39 | GPT-4o Realtime 2024-10 RealtimeVoice | OpenAI | conversation | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | $0.02 | 200 | 128K | |
40 | GPT-4o mini Realtime 2024-10 RealtimeVoiceCost-Effective | OpenAI | conversation | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | $0.008 | 250 | 128K |