AI LLM Leaderboard

Compare the latest Large Language Models across multiple benchmarks and performance metrics

Total Models
35+

Latest models included

Benchmarks
8

Comprehensive tests

Top Score
97.3%

DeepSeek R1 (MATH)

Fastest Model
189

TPS (o3-mini)

Filters & Search
Model Rankings - 40 Models
#OrganizationCategoryMMLUGPQAMMMUHellaSwagHumanEvalBBHardGSM8KMATHCost/1KTPSContextTrend
1
o1
2024-09
Top MMLUPremium
OpenAIreasoning
88.4%
92.3%78%N/AN/AN/AN/AN/A94.8%$0.0615128K
2
DeepSeek R1
2025-01
Best MATHBest Reasoning
New
DeepSeekreasoning
86.5%
90.8%71.5%N/AN/AN/AN/AN/A97.3%$0.0086564K
3
o3-mini
2024-12
Fastest TPSBest HumanEval
New
OpenAIcoding
86%
86%75%N/AN/A97%N/AN/AN/A$0.02189128K
4
Claude 3.7 Sonnet
2024-11
Best MATH
New
Anthropicreasoning
85.5%
86.1%84.8%75%N/AN/AN/AN/A96.2%$0.0265200K
5
o4-mini
2024-12
Best MATH
New
OpenAIreasoning
85.2%
N/A81.4%81.6%N/AN/AN/AN/A92.7%$0.0285128K
6
Gemini 2.5 Pro
2024-12
LatestHigh GPQA
New
Googlemultimodal
85.2%
89.8%84%81.7%N/AN/AN/AN/AN/A$0.02551M
7
o3
2024-12
Top MATH
New
OpenAIreasoning
85%
N/A83.3%82.9%N/AN/AN/AN/A88.9%$0.0618128K
8
o1-preview
2024-09
Preview
OpenAIreasoning
84.9%
90.8%78.3%N/AN/AN/AN/AN/A85.5%$0.04520128K
9
Claude 3.5 Sonnet
2024-06
User's ChoiceBest GSM8K
Anthropicreasoning
82.3%
88.7%59.4%68.3%89%92%93.1%96.4%71.1%$0.015170200K
10
GPT-4o
2024-05
Least LatencyMultimodal
OpenAImultimodal
82.2%
88.7%53.6%69.1%94.2%90.2%91.3%89.8%76.6%$0.01585128K
11
o1-mini
2024-09
Good Coding
OpenAIcoding
81.9%
85.2%60%N/AN/A92.4%N/AN/A90%$0.02545128K
12
Gemini 2.0 Flash
2024-12
FastGood Performance
New
Googlemultimodal
81.8%
87%59%N/AN/A91%N/AN/A90%$0.011101M
13
Claude Opus 4
2024-12
LatestPremium
New
Anthropicreasoning
81%
88.8%83.3%76.5%N/AN/AN/AN/A75.5%$0.04545200K
14
DeepSeek V3
2024-12
Open SourceGood MATH
New
DeepSeekcoding
80.1%
88.5%59.1%N/AN/A82.6%N/AN/A90.2%$0.0049564K
15
Llama 3.1 405B
2024-07
Open SourceLargest Open
Metareasoning
78.9%
88.6%51.1%64.5%87%89%81.3%96.8%73.8%$0.01535128K
16
Claude Sonnet 4
2024-12
LatestHigh GPQA
New
Anthropicreasoning
78.8%
86.5%83.8%74.4%N/AN/AN/AN/A70.5%$0.02560200K
17
GPT-4 Turbo
2024-04
Highly PreferredBalanced
OpenAIreasoning
77.6%
86.5%48%63.1%94.2%90.2%87.6%91%72.2%$0.0345128K
18
Claude 3 Opus
2024-03
PremiumComplete Benchmarks
Anthropicreasoning
77.2%
86.8%50.4%59.4%95.4%84.9%86.8%95%60.1%$0.04545200K
19
GPT-4.1
2024-11
Latest GPT
New
OpenAIreasoning
77.1%
90.2%66.3%74.8%N/AN/AN/AN/AN/A$0.0450128K
20
Gemini 2.0 Pro Experimental
2024-12
ExperimentalGood MATH
New
Googlemultimodal
77.1%
79.1%64.7%72.7%N/AN/AN/AN/A91.8%$0.015601M
21
Claude 3.7 Sonnet (Normal)
2024-11
Balanced
New
Anthropicreasoning
76.3%
83.2%68%71.8%N/AN/AN/AN/A82.2%$0.01585200K
22
Llama 4 Maverick
2024-12
Open Source
New
Metareasoning
75.9%
84.6%69.8%73.4%N/AN/AN/AN/AN/A$0.00585128K
23
Llama 3.3 70B
2024-10
Open SourceGood Coding
Metacoding
75.5%
86%50.5%N/AN/A88.4%N/AN/A77%$0.00690128K
24
GPT-4.1 mini
2024-11
Cost-Effective
New
OpenAIreasoning
75.1%
87.5%65%72.7%N/AN/AN/AN/AN/A$0.01595128K
25
Grok-2
2024-08
Good Coding
xAIcoding
74.8%
87.5%56%66.1%N/A88.4%N/AN/A76.1%$0.0175128K
26
Grok 3
2024-12
Latest
New
xAIreasoning
74.3%
N/A75.4%73.2%N/AN/AN/AN/AN/A$0.01270128K
27
Gemini 1.5 Pro
2024-02
Largest ContextComplete Benchmarks
Googlemultimodal
73.6%
81.9%46.2%62.2%92.5%71.9%84%91.7%58.5%$0.0125382M
28
Gemini 2.5 Flash Lite
2024-12
Latest
New
Googlemultimodal
71.8%
84.5%66.7%72.9%N/AN/AN/AN/A63.1%$0.01751M
29
GPT-4
2023-03
Most ExpensiveClassic
OpenAIreasoning
71.4%
86.4%35.7%56.8%95.3%67%83.1%92%52.9%$0.18258K
30
Llama 3.2 90B
2024-09
Open Source
Metareasoning
69.6%
86%46.7%60.3%N/AN/AN/A86.9%68%$0.00880128K
31
Claude 3 Sonnet
2024-03
BalancedComplete Benchmarks
Anthropicreasoning
69.1%
79%40.4%53.1%89%73%82.9%92.3%43.1%$0.01290200K
32
Gemini 1.5 Flash
2024-05
FastComplete Benchmarks
Googlemultimodal
68.6%
78.9%39.5%56.1%81.3%67.5%89.2%68.8%67.7%$0.008951M
33
GPT-4o mini
2024-07
Cost-Effective
OpenAIconversation
67.8%
82%40.2%59.4%N/A87.2%N/AN/A70.2%$0.007120128K
34
Llama 4 Scout
2024-12
Least ExpensiveOpen Source
New
Metaconversation
67%
74.3%57.2%69.4%N/AN/AN/AN/AN/A$0.0003120128K
35
Claude 3.5 Haiku
2024-11
FastCost-Effective
New
Anthropicconversation
66%
65%41.6%N/AN/A88.1%N/AN/A69.2%$0.005140200K
36
Claude 3 Haiku
2024-03
FastComplete Benchmarks
Anthropicconversation
65.3%
75.2%33.3%50.2%85.9%75.9%73.7%88.9%38.9%$0.004160200K
37
GPT-4.1 nano
2024-11
Ultra Fast
New
OpenAIconversation
61.9%
80.1%50.3%55.4%N/AN/AN/AN/AN/A$0.00515032K
38
o3-pro
2025-01
Upcoming
New
OpenAIreasoning
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.0820128K
39
GPT-4o Realtime
2024-10
RealtimeVoice
OpenAIconversation
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.02200128K
40
GPT-4o mini Realtime
2024-10
RealtimeVoiceCost-Effective
OpenAIconversation
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.008250128K

Try our other free tools!

Explore more powerful AI tools to enhance your productivity and creativity.

GENERATOR

AI FAQ GENERATOR

AI FAQ GENERATOR
Generate comprehensive FAQ sections for your website or product using AI. Create helpful answers to common questions automatically.
GENERATOR

AI ANSWER GENERATOR

AI ANSWER GENERATOR
Create intelligent, contextual answers to any question or query. Perfect for customer support and knowledge base creation.
EDITOR

AI HUMANIZE TEXT

AI HUMANIZE TEXT
Transform robotic or AI-generated text into natural, human-sounding language. Improve relatability and tone with just one click.
GENERATOR

AI EMAIL RESPONSE GENERATOR

AI EMAIL RESPONSE GENERATOR
Generate professional email responses tailored to your specific needs. Save time with smart, contextual email automation.

BUILD SMARTER INTERACTIONS TOGETHER.

START NOW
WHISPERCHAT AI
© 2025 WHISPERCHAT AIBACK TO TOP