AI LLM Leaderboard

Compare the latest Large Language Models across multiple benchmarks and performance metrics

Total Models

35+

Latest models included

Benchmarks

Comprehensive tests

Top Score

97.3%

DeepSeek R1 (MATH)

Fastest Model

189

TPS (o3-mini)

Filters & Search

Model Rankings - 40 Models

#		Organization	Category		MMLU	GPQA	MMMU	HellaSwag	HumanEval	BBHard	GSM8K	MATH	Cost/1K	TPS	Context
1	o1 2024-09 Top MMLUPremium	OpenAI	reasoning	88.4%	92.3%	78%	N/A	N/A	N/A	N/A	N/A	94.8%	$0.06	15	128K
2	DeepSeek R1 2025-01 Best MATHBest Reasoning New	DeepSeek	reasoning	86.5%	90.8%	71.5%	N/A	N/A	N/A	N/A	N/A	97.3%	$0.008	65	64K
3	o3-mini 2024-12 Fastest TPSBest HumanEval New	OpenAI	coding	86%	86%	75%	N/A	N/A	97%	N/A	N/A	N/A	$0.02	189	128K
4	Claude 3.7 Sonnet 2024-11 Best MATH New	Anthropic	reasoning	85.5%	86.1%	84.8%	75%	N/A	N/A	N/A	N/A	96.2%	$0.02	65	200K
5	o4-mini 2024-12 Best MATH New	OpenAI	reasoning	85.2%	N/A	81.4%	81.6%	N/A	N/A	N/A	N/A	92.7%	$0.02	85	128K
6	Gemini 2.5 Pro 2024-12 LatestHigh GPQA New	Google	multimodal	85.2%	89.8%	84%	81.7%	N/A	N/A	N/A	N/A	N/A	$0.02	55	1M
7	o3 2024-12 Top MATH New	OpenAI	reasoning	85%	N/A	83.3%	82.9%	N/A	N/A	N/A	N/A	88.9%	$0.06	18	128K
8	o1-preview 2024-09 Preview	OpenAI	reasoning	84.9%	90.8%	78.3%	N/A	N/A	N/A	N/A	N/A	85.5%	$0.045	20	128K
9	Claude 3.5 Sonnet 2024-06 User's ChoiceBest GSM8K	Anthropic	reasoning	82.3%	88.7%	59.4%	68.3%	89%	92%	93.1%	96.4%	71.1%	$0.015	170	200K
10	GPT-4o 2024-05 Least LatencyMultimodal	OpenAI	multimodal	82.2%	88.7%	53.6%	69.1%	94.2%	90.2%	91.3%	89.8%	76.6%	$0.015	85	128K
11	o1-mini 2024-09 Good Coding	OpenAI	coding	81.9%	85.2%	60%	N/A	N/A	92.4%	N/A	N/A	90%	$0.025	45	128K
12	Gemini 2.0 Flash 2024-12 FastGood Performance New	Google	multimodal	81.8%	87%	59%	N/A	N/A	91%	N/A	N/A	90%	$0.01	110	1M
13	Claude Opus 4 2024-12 LatestPremium New	Anthropic	reasoning	81%	88.8%	83.3%	76.5%	N/A	N/A	N/A	N/A	75.5%	$0.045	45	200K
14	DeepSeek V3 2024-12 Open SourceGood MATH New	DeepSeek	coding	80.1%	88.5%	59.1%	N/A	N/A	82.6%	N/A	N/A	90.2%	$0.004	95	64K
15	Llama 3.1 405B 2024-07 Open SourceLargest Open	Meta	reasoning	78.9%	88.6%	51.1%	64.5%	87%	89%	81.3%	96.8%	73.8%	$0.015	35	128K
16	Claude Sonnet 4 2024-12 LatestHigh GPQA New	Anthropic	reasoning	78.8%	86.5%	83.8%	74.4%	N/A	N/A	N/A	N/A	70.5%	$0.025	60	200K
17	GPT-4 Turbo 2024-04 Highly PreferredBalanced	OpenAI	reasoning	77.6%	86.5%	48%	63.1%	94.2%	90.2%	87.6%	91%	72.2%	$0.03	45	128K
18	Claude 3 Opus 2024-03 PremiumComplete Benchmarks	Anthropic	reasoning	77.2%	86.8%	50.4%	59.4%	95.4%	84.9%	86.8%	95%	60.1%	$0.045	45	200K
19	GPT-4.1 2024-11 Latest GPT New	OpenAI	reasoning	77.1%	90.2%	66.3%	74.8%	N/A	N/A	N/A	N/A	N/A	$0.04	50	128K
20	Gemini 2.0 Pro Experimental 2024-12 ExperimentalGood MATH New	Google	multimodal	77.1%	79.1%	64.7%	72.7%	N/A	N/A	N/A	N/A	91.8%	$0.015	60	1M
21	Claude 3.7 Sonnet (Normal) 2024-11 Balanced New	Anthropic	reasoning	76.3%	83.2%	68%	71.8%	N/A	N/A	N/A	N/A	82.2%	$0.015	85	200K
22	Llama 4 Maverick 2024-12 Open Source New	Meta	reasoning	75.9%	84.6%	69.8%	73.4%	N/A	N/A	N/A	N/A	N/A	$0.005	85	128K
23	Llama 3.3 70B 2024-10 Open SourceGood Coding	Meta	coding	75.5%	86%	50.5%	N/A	N/A	88.4%	N/A	N/A	77%	$0.006	90	128K
24	GPT-4.1 mini 2024-11 Cost-Effective New	OpenAI	reasoning	75.1%	87.5%	65%	72.7%	N/A	N/A	N/A	N/A	N/A	$0.015	95	128K
25	Grok-2 2024-08 Good Coding	xAI	coding	74.8%	87.5%	56%	66.1%	N/A	88.4%	N/A	N/A	76.1%	$0.01	75	128K
26	Grok 3 2024-12 Latest New	xAI	reasoning	74.3%	N/A	75.4%	73.2%	N/A	N/A	N/A	N/A	N/A	$0.012	70	128K
27	Gemini 1.5 Pro 2024-02 Largest ContextComplete Benchmarks	Google	multimodal	73.6%	81.9%	46.2%	62.2%	92.5%	71.9%	84%	91.7%	58.5%	$0.0125	38	2M
28	Gemini 2.5 Flash Lite 2024-12 Latest New	Google	multimodal	71.8%	84.5%	66.7%	72.9%	N/A	N/A	N/A	N/A	63.1%	$0.01	75	1M
29	GPT-4 2023-03 Most ExpensiveClassic	OpenAI	reasoning	71.4%	86.4%	35.7%	56.8%	95.3%	67%	83.1%	92%	52.9%	$0.18	25	8K
30	Llama 3.2 90B 2024-09 Open Source	Meta	reasoning	69.6%	86%	46.7%	60.3%	N/A	N/A	N/A	86.9%	68%	$0.008	80	128K
31	Claude 3 Sonnet 2024-03 BalancedComplete Benchmarks	Anthropic	reasoning	69.1%	79%	40.4%	53.1%	89%	73%	82.9%	92.3%	43.1%	$0.012	90	200K
32	Gemini 1.5 Flash 2024-05 FastComplete Benchmarks	Google	multimodal	68.6%	78.9%	39.5%	56.1%	81.3%	67.5%	89.2%	68.8%	67.7%	$0.008	95	1M
33	GPT-4o mini 2024-07 Cost-Effective	OpenAI	conversation	67.8%	82%	40.2%	59.4%	N/A	87.2%	N/A	N/A	70.2%	$0.007	120	128K
34	Llama 4 Scout 2024-12 Least ExpensiveOpen Source New	Meta	conversation	67%	74.3%	57.2%	69.4%	N/A	N/A	N/A	N/A	N/A	$0.0003	120	128K
35	Claude 3.5 Haiku 2024-11 FastCost-Effective New	Anthropic	conversation	66%	65%	41.6%	N/A	N/A	88.1%	N/A	N/A	69.2%	$0.005	140	200K
36	Claude 3 Haiku 2024-03 FastComplete Benchmarks	Anthropic	conversation	65.3%	75.2%	33.3%	50.2%	85.9%	75.9%	73.7%	88.9%	38.9%	$0.004	160	200K
37	GPT-4.1 nano 2024-11 Ultra Fast New	OpenAI	conversation	61.9%	80.1%	50.3%	55.4%	N/A	N/A	N/A	N/A	N/A	$0.005	150	32K
38	o3-pro 2025-01 Upcoming New	OpenAI	reasoning	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	$0.08	20	128K
39	GPT-4o Realtime 2024-10 RealtimeVoice	OpenAI	conversation	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	$0.02	200	128K
40	GPT-4o mini Realtime 2024-10 RealtimeVoiceCost-Effective	OpenAI	conversation	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	$0.008	250	128K

Try our other free tools!

Explore more powerful AI tools to enhance your productivity and creativity.

GENERATOR

AI FAQ GENERATOR

Generate comprehensive FAQ sections for your website or product using AI. Create helpful answers to common questions automatically.

GENERATOR

AI ANSWER GENERATOR

Create intelligent, contextual answers to any question or query. Perfect for customer support and knowledge base creation.

EDITOR

AI HUMANIZE TEXT

Transform robotic or AI-generated text into natural, human-sounding language. Improve relatability and tone with just one click.

GENERATOR

AI EMAIL RESPONSE GENERATOR

Generate professional email responses tailored to your specific needs. Save time with smart, contextual email automation.

Whisperchat AI

AI LLM Leaderboard

Try our other free tools!

AI FAQ GENERATOR

AI ANSWER GENERATOR

AI HUMANIZE TEXT

AI EMAIL RESPONSE GENERATOR

BUILD SMARTER INTERACTIONS TOGETHER.