What is the LiveBench benchmark?

LiveBench is a reasoning/coding used to evaluate and rank large language models. This page lists 71 models by their LiveBench score.

Which model is best on LiveBench?

GPT-5.5 (OpenAI) currently leads the LiveBench leaderboard with a score of 81.3.

What is the best value model on LiveBench?

Qwen: Qwen3 235B A22B Thinking 2507 offers the best score-per-dollar on LiveBench — a score of 52.9 at $0.10 per 1M output tokens.

LiveBench Leaderboard

reasoning/coding71 models · Updated June 2026

On the LiveBench benchmark, GPT-5.5 ranks #1 with a score of 81.3, while Qwen: Qwen3 235B A22B Thinking 2507 offers the best score-per-dollar at $0.10/1M output tokens. The full ranking, with cost per million tokens, is below.

💰 Best value

Qwen: Qwen3 235B A22B Thinking 2507 — score 52.9 at $0.10/1M output tokens

#	Model	Score	Output / 1M	Context
1	GPT-5.5OpenAI	81.3	—	—
2	GPT-5.4OpenAI	80.9	—	—
3	Gemini 3.1 ProGoogle DeepMind	80.7	—	—
4	Claude Opus 4.8 ThinkingNanoGPT	79.5	$25.01	1M
5	Claude Fable 5Anthropic	78.6	—	—
6	Claude Opus 4.7Anthropic	77.1	—	—
7	Anthropic: Claude Opus 4.6 (Fast)Anthropic	76.8	$150.00	1M
8	Z.ai: GLM 5.2Z.ai	76.2	$3.00	1M
9	Claude Opus 4.5Anthropic	76.0	—	—
10	Gemini 3.5 FlashNanoGPT	75.8	$9.00	1M
11	Claude Sonnet 4.6 ThinkingNanoGPT	75.7	$14.99	1M
12	GPT-5.2OpenAI	75.4	—	—
13	Qwen3.7 Max ThinkingNanoGPT	75.2	$7.50	1M
14	DeepSeek-V4-ProDeepSeek	74.4	—	—
15	GPT-5.1-Codex-MaxOpenAI	74.4	—	—
16	GPT-5.2 CodexOpenAI	74.3	—	—
17	Gemini 3 ProGoogle DeepMind	73.5	—	—
18	GPT-5.3 CodexOpenAI	73.2	—	—
19	Gemini 3 Flash ThinkingNanoGPT	73.0	$3.00	1M
20	GPT-5.1Poe	72.6	$9.00	400K
21	Kimi K2.6Moonshot	72.4	—	—
22	MoonshotAI: Kimi K2.7 CodeMoonshotAI	71.9	$4.00	262K
23	GPT 5.4 NanoNanoGPT	71.3	$1.25	400K
24	GPT-5 ProOpenAI	71.3	—	—
25	Qwen3.6 PlusAlibaba Token Plan	70.8	$0.00	1M
26	GLM-5.1Z.ai (Zhipu AI)	70.6	—	—
27	MiniMax M3 ThinkingNanoGPT	70.0	$1.20	512K
28	Grok Build 0.1NanoGPT	69.6	$2.00	256K
29	GPT-5.1-CodexOpenAI	69.3	—	—
30	Kimi K2.5Moonshot	69.2	—	—
31	Grok 4.20 (Reasoning)xAI	69.0	$2.50	1M
32	GLM-5Z.ai (Zhipu AI)	68.7	—	—
33	Claude Sonnet 4.5Anthropic	67.9	—	—
34	GPT 5.4 MiniNanoGPT	67.7	$4.50	400K
35	DeepSeek-V4-FlashDeepSeek	67.7	—	—
36	Grok 4.3NanoGPT	67.4	$2.50	1M
37	GPT-5 miniOpenAI	66.6	—	—
38	Qwen: Qwen3.6 27BQwen	65.6	$2.39	262K
39	MiniMax M2.7NanoGPT	65.0	$1.20	205K
40	DeepSeek: DeepSeek V3.2DeepSeek	63.1	$0.34	131K
41	Gemma 4 31B ITLilac	62.4	$0.35	262K
42	Kimi K2 ThinkingMoonshot	62.3	—	—
43	Gemini 3.1 Flash LiteNanoGPT	62.1	$1.50	1M
44	grok-4-0709Jiekou.AI	61.8	$13.50	256K
45	Claude 4.1 Opus Thinking (32K)NanoGPT	61.4	$75.00	200K
46	Claude Haiku 4.5Anthropic	61.0	—	—
47	GPT 5.1 Codex MiniNanoGPT	60.8	$2.00	400K
48	Claude 4 SonnetNanoGPT	60.6	$14.99	200K
49	Qwen: Qwen3.6 FlashQwen	60.5	$1.13	1M
50	MiniMax M2.5NanoGPT	60.3	$1.20	205K
51	Grok 4.1 FastxAI	60.1	—	—
52	GPT-5.3-InstantPoe	59.8	$13.00	128K
53	MiMo-V2-ProXiaomi Corp	58.4	—	—
54	Google: Gemini 2.5 Pro Preview 06-05Google	57.5	$10.00	1M
55	GLM-4.7Z.ai (Zhipu AI)	57.3	—	—
56	GLM-4.6Z.ai (Zhipu AI),Tsinghua University	54.7	—	—
57	Qwen: Qwen3 235B A22B Thinking 2507Qwen	52.9	$0.10	262K
58	Gemini 2.5 Flash PreviewNanoGPT	52.3	$0.60	1M
59	Qwen: Qwen3 Next 80B A3B ThinkingQwen	51.0	$0.78	262K
60	NVIDIA: Nemotron 3 UltraNVIDIA	50.7	$2.20	1M
61	GPT-5 nanoOpenAI	48.8	—	—
62	GLM 5V Turbo ThinkingNanoGPT	48.8	$4.00	203K
63	gpt-oss-120bOpenAI	46.4	—	—
64	Qwen: Qwen3 32BQwen	42.7	$0.28	131K
65	Gemini 2.5 Flash LiteNanoGPT	41.5	$0.40	1M
66	Z.ai: GLM 4.6VZ.ai	38.9	$0.90	131K
67	Qwen: Qwen3 30B A3BQwen	38.8	$0.50	131K
68	Mistral: Devstral 2 2512Mistral	38.8	$2.00	262K
69	Grok 4.20 (Non-Reasoning)xAI	37.9	$2.50	1M
70	Nemotron 3 Super 120B A12BSynthetic	32.0	$1.00	262K
71	X-Ai/Grok 4.1 Fast Non ReasoningQiniu	31.6	—	2M

Frequently asked questions

Pricing is indicative — confirm with the provider before production use. Updated June 2026.