AI LLM Leaderboard

Compare the latest Large Language Models across multiple benchmarks and performance metrics

Total Models
35+

Latest models included

Benchmarks
8

Comprehensive tests

Top Score
97.3%

DeepSeek R1 (MATH)

Fastest Model
189

TPS (o3-mini)

Filters & Search
Model Rankings - 40 Models
#OrganizationCategoryMMLUGPQAMMMUHellaSwagHumanEvalBBHardGSM8KMATHCost/1KTPSContextTrend
1
o1
2024-09
Top MMLUPremium
OpenAIreasoning
88.4%
92.3%78%N/AN/AN/AN/AN/A94.8%$0.0615128K
2
DeepSeek R1
2025-01
Best MATHBest Reasoning
New
DeepSeekreasoning
86.5%
90.8%71.5%N/AN/AN/AN/AN/A97.3%$0.0086564K
3
o3-mini
2024-12
Fastest TPSBest HumanEval
New
OpenAIcoding
86%
86%75%N/AN/A97%N/AN/AN/A$0.02189128K
4
Claude 3.7 Sonnet
2024-11
Best MATH
New
Anthropicreasoning
85.5%
86.1%84.8%75%N/AN/AN/AN/A96.2%$0.0265200K
5
o4-mini
2024-12
Best MATH
New
OpenAIreasoning
85.2%
N/A81.4%81.6%N/AN/AN/AN/A92.7%$0.0285128K
6
Gemini 2.5 Pro
2024-12
LatestHigh GPQA
New
Googlemultimodal
85.2%
89.8%84%81.7%N/AN/AN/AN/AN/A$0.02551M
7
o3
2024-12
Top MATH
New
OpenAIreasoning
85%
N/A83.3%82.9%N/AN/AN/AN/A88.9%$0.0618128K
8
o1-preview
2024-09
Preview
OpenAIreasoning
84.9%
90.8%78.3%N/AN/AN/AN/AN/A85.5%$0.04520128K
9
Claude 3.5 Sonnet
2024-06
User's ChoiceBest GSM8K
Anthropicreasoning
82.3%
88.7%59.4%68.3%89%92%93.1%96.4%71.1%$0.015170200K
10
GPT-4o
2024-05
Least LatencyMultimodal
OpenAImultimodal
82.2%
88.7%53.6%69.1%94.2%90.2%91.3%89.8%76.6%$0.01585128K
11
o1-mini
2024-09
Good Coding
OpenAIcoding
81.9%
85.2%60%N/AN/A92.4%N/AN/A90%$0.02545128K
12
Gemini 2.0 Flash
2024-12
FastGood Performance
New
Googlemultimodal
81.8%
87%59%N/AN/A91%N/AN/A90%$0.011101M
13
Claude Opus 4
2024-12
LatestPremium
New
Anthropicreasoning
81%
88.8%83.3%76.5%N/AN/AN/AN/A75.5%$0.04545200K
14
DeepSeek V3
2024-12
Open SourceGood MATH
New
DeepSeekcoding
80.1%
88.5%59.1%N/AN/A82.6%N/AN/A90.2%$0.0049564K
15
Llama 3.1 405B
2024-07
Open SourceLargest Open
Metareasoning
78.9%
88.6%51.1%64.5%87%89%81.3%96.8%73.8%$0.01535128K
16
Claude Sonnet 4
2024-12
LatestHigh GPQA
New
Anthropicreasoning
78.8%
86.5%83.8%74.4%N/AN/AN/AN/A70.5%$0.02560200K
17
GPT-4 Turbo
2024-04
Highly PreferredBalanced
OpenAIreasoning
77.6%
86.5%48%63.1%94.2%90.2%87.6%91%72.2%$0.0345128K
18
Claude 3 Opus
2024-03
PremiumComplete Benchmarks
Anthropicreasoning
77.2%
86.8%50.4%59.4%95.4%84.9%86.8%95%60.1%$0.04545200K
19
GPT-4.1
2024-11
Latest GPT
New
OpenAIreasoning
77.1%
90.2%66.3%74.8%N/AN/AN/AN/AN/A$0.0450128K
20
Gemini 2.0 Pro Experimental
2024-12
ExperimentalGood MATH
New
Googlemultimodal
77.1%
79.1%64.7%72.7%N/AN/AN/AN/A91.8%$0.015601M
21
Claude 3.7 Sonnet (Normal)
2024-11
Balanced
New
Anthropicreasoning
76.3%
83.2%68%71.8%N/AN/AN/AN/A82.2%$0.01585200K
22
Llama 4 Maverick
2024-12
Open Source
New
Metareasoning
75.9%
84.6%69.8%73.4%N/AN/AN/AN/AN/A$0.00585128K
23
Llama 3.3 70B
2024-10
Open SourceGood Coding
Metacoding
75.5%
86%50.5%N/AN/A88.4%N/AN/A77%$0.00690128K
24
GPT-4.1 mini
2024-11
Cost-Effective
New
OpenAIreasoning
75.1%
87.5%65%72.7%N/AN/AN/AN/AN/A$0.01595128K
25
Grok-2
2024-08
Good Coding
xAIcoding
74.8%
87.5%56%66.1%N/A88.4%N/AN/A76.1%$0.0175128K
26
Grok 3
2024-12
Latest
New
xAIreasoning
74.3%
N/A75.4%73.2%N/AN/AN/AN/AN/A$0.01270128K
27
Gemini 1.5 Pro
2024-02
Largest ContextComplete Benchmarks
Googlemultimodal
73.6%
81.9%46.2%62.2%92.5%71.9%84%91.7%58.5%$0.0125382M
28
Gemini 2.5 Flash Lite
2024-12
Latest
New
Googlemultimodal
71.8%
84.5%66.7%72.9%N/AN/AN/AN/A63.1%$0.01751M
29
GPT-4
2023-03
Most ExpensiveClassic
OpenAIreasoning
71.4%
86.4%35.7%56.8%95.3%67%83.1%92%52.9%$0.18258K
30
Llama 3.2 90B
2024-09
Open Source
Metareasoning
69.6%
86%46.7%60.3%N/AN/AN/A86.9%68%$0.00880128K
31
Claude 3 Sonnet
2024-03
BalancedComplete Benchmarks
Anthropicreasoning
69.1%
79%40.4%53.1%89%73%82.9%92.3%43.1%$0.01290200K
32
Gemini 1.5 Flash
2024-05
FastComplete Benchmarks
Googlemultimodal
68.6%
78.9%39.5%56.1%81.3%67.5%89.2%68.8%67.7%$0.008951M
33
GPT-4o mini
2024-07
Cost-Effective
OpenAIconversation
67.8%
82%40.2%59.4%N/A87.2%N/AN/A70.2%$0.007120128K
34
Llama 4 Scout
2024-12
Least ExpensiveOpen Source
New
Metaconversation
67%
74.3%57.2%69.4%N/AN/AN/AN/AN/A$0.0003120128K
35
Claude 3.5 Haiku
2024-11
FastCost-Effective
New
Anthropicconversation
66%
65%41.6%N/AN/A88.1%N/AN/A69.2%$0.005140200K
36
Claude 3 Haiku
2024-03
FastComplete Benchmarks
Anthropicconversation
65.3%
75.2%33.3%50.2%85.9%75.9%73.7%88.9%38.9%$0.004160200K
37
GPT-4.1 nano
2024-11
Ultra Fast
New
OpenAIconversation
61.9%
80.1%50.3%55.4%N/AN/AN/AN/AN/A$0.00515032K
38
o3-pro
2025-01
Upcoming
New
OpenAIreasoning
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.0820128K
39
GPT-4o Realtime
2024-10
RealtimeVoice
OpenAIconversation
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.02200128K
40
GPT-4o mini Realtime
2024-10
RealtimeVoiceCost-Effective
OpenAIconversation
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.008250128K

BUILD SMARTER INTERACTIONS TOGETHER.

START NOW
WHISPERCHAT AI
© 2025 WHISPERCHAT AIBACK TO TOP