Илон Маск представил Grok 4: тесты и бенчмарки
Компания xAI, основанная Илоном Маском, официально запустила Grok 4 — новую флагманскую модель искусственного интеллекта, которая, по заявлениям разработчиков, превосходит по уровню знаний обладателей PhD (PhD, или Doctor of Philosophy, - это высшая академическая степень, присуждаемая после успешной защиты диссертации, в большинстве зарубежных стран, включая США, Канаду и многие европейские страны) во всех научных дисциплинах. Вместе с моделью анонсирована подписка SuperGrok Heavy стоимостью $300 в месяц, что делает её самой дорогой в индустрии ИИ.
Grok 4 против GPT-5 и Gemini: кто сильнее?
xAI утверждает, что Grok 4 демонстрирует революционные результаты в тестах:
- 25,4% в экзамене Humanity’s Last Exam (против 21,6% у Gemini 2.5 Pro и 21% у OpenAI o3).
- 44,4% в версии Grok 4 Heavy с использованием инструментов (против 26,9% у Gemini).
- 16,2% в визуально-логическом тесте ARC-AGI-2 (почти вдвое выше, чем у Claude Opus 4).
Маск заявил, что к концу года Grok 4 сможет открывать новые технологии, а через два года — совершать прорывы в физике. Однако эксперты скептически относятся к таким прогнозам, учитывая склонность ИИ к "галлюцинациям" и прошлые задержки в проектах Маска.
Скандалы и проблемы: Grok хвалил Гитлера
Запуск Grok 4 омрачён скандалом вокруг его предшественника — Grok 3 публиковал антисемитские посты в X (бывший Twitter), включая восхваление Гитлера. xAI удалила оскорбительные сообщения и изменила системные инструкции для ИИ, но инцидент подорвал доверие к боту.
При этом Маск избегал темы скандала на презентации, сосредоточившись на технических достижениях.
SuperGrok Heavy: что даёт подписка за $300?
Помимо раннего доступа к Grok 4 Heavy (мультиагентной версии), подписчики получат:
- ИИ для программирования (уже в августе).
- Мультимодального агента (сентябрь).
- Генератор видео (октябрь).
Для бизнеса xAI открывает API, чтобы разработчики могли интегрировать Grok 4 в свои продукты.
Вызовет ли Grok 4 доверие у бизнеса?
Несмотря на впечатляющие бенчмарки, конкуренция с ChatGPT, DeepSeek, Claude и Gemini будет жёсткой. Компаниям предстоит решить, готовы ли они доверить критически важные процессы ИИ, который ещё недавно генерировал опасный контент.
Компания | Контекст | Интеллект | Цена | Токены/c | Задержка | |
---|---|---|---|---|---|---|
Grok 4 | xAI | 256k | 73 | $6.00 | 76.1 | 5.69 |
o3-pro | OpenAI | 200k | 71 | $35.00 | ||
Gemini 2.5 Pro | 1m | 70 | $3.44 | 141.2 | 37.54 | |
o3 | OpenAI | 200k | 70 | $3.50 | 138.2 | 19.38 |
o4-mini (high) | OpenAI | 200k | 70 | $1.93 | 105.1 | 45.97 |
Gemini 2.5 Pro (Mar '25) | 1m | 69 | $3.44 | 141.7 | 34.38 | |
DeepSeek R1 0528 (May '25) | DeepSeek | 128k | 68 | $0.96 | 21.6 | 3.75 |
Gemini 2.5 Pro (May' 25) | 1m | 68 | $3.44 | 142.2 | 34.43 | |
Grok 3 mini Reasoning (high) | xAI | 1m | 67 | $0.35 | 209.5 | 0.51 |
o3-mini (high) | OpenAI | 200k | 66 | $1.93 | 139.3 | 45.38 |
Gemini 2.5 Flash (Reasoning) | 1m | 65 | $0.99 | 356.3 | 7.49 | |
Claude 4 Opus Thinking | Anthropic | 200k | 64 | $30.00 | 68.0 | 2.45 |
MiniMax M1 80k | MiniMax | 1m | 63 | $0.82 | 19.8 | 1.26 |
o3-mini | OpenAI | 200k | 63 | $1.93 | 143.7 | 15.45 |
Qwen3 235B (Reasoning) | Alibaba | 128k | 62 | $2.63 | 42.8 | 1.24 |
o1 | OpenAI | 200k | 62 | $26.25 | 165.7 | 16.69 |
MiniMax M1 40k | MiniMax | 1m | 61 | $0.82 | 28.0 | 1.27 |
Llama Nemotron Ultra Reasoning | NVIDIA | 128k | 61 | $0.90 | 41.6 | 0.66 |
Claude 4 Sonnet Thinking | Anthropic | 200k | 61 | $6.00 | 87.2 | 1.62 |
Gemini 2.5 Flash (April '25) (Reasoning) | 1m | 60 | $0.99 | 399.8 | 6.80 | |
DeepSeek R1 (Jan '25) | DeepSeek | 128k | 60 | $2.36 | ||
o1-preview | OpenAI | 128k | 60 | $26.25 | 164.9 | 19.07 |
Qwen3 32B (Reasoning) | Alibaba | 128k | 59 | $2.63 | 61.1 | 1.18 |
QwQ-32B | Alibaba | 131k | 58 | $0.32 | 86.1 | 0.50 |
Claude 4 Opus | Anthropic | 200k | 58 | $30.00 | 64.9 | 2.51 |
Claude 3.7 Sonnet Thinking | Anthropic | 200k | 57 | $6.00 | 89.3 | 1.62 |
o1-pro | OpenAI | 200k | 56 | $262.50 | ||
Grok 3 Reasoning Beta | xAI | 1m | 56 | $0.00 | ||
Magistral Medium | Mistral | 128k | 56 | $2.75 | 140.6 | 0.42 |
Qwen3 14B (Reasoning) | Alibaba | 128k | 56 | $1.31 | 65.5 | 1.11 |
Qwen3 30B A3B (Reasoning) | Alibaba | 128k | 56 | $0.75 | 86.5 | 1.10 |
Gemini 2.5 Flash-Lite (Reasoning) | 1m | 55 | $0.17 | 637.6 | 7.48 | |
Magistral Small | Mistral | 128k | 55 | $0.75 | 210.6 | 0.32 |
o1-mini | OpenAI | 128k | 54 | $1.93 | 238.6 | 8.64 |
Gemini 2.5 Flash | 1m | 53 | $0.26 | 287.5 | 0.39 | |
DeepSeek V3 0324 (Mar '25) | DeepSeek | 128k | 53 | $0.48 | 24.0 | 3.42 |
Claude 4 Sonnet | Anthropic | 200k | 53 | $6.00 | 83.6 | 1.64 |
GPT-4.5 (Preview) | OpenAI | 128k | 53 | $93.75 | 71.4 | 0.99 |
GPT-4.1 mini | OpenAI | 1m | 53 | $0.70 | 64.6 | 0.40 |
GPT-4.1 | OpenAI | 1m | 53 | $3.50 | 110.0 | 0.46 |
Gemini 2.0 Flash Thinking exp. (Jan '25) | 1m | 52 | $0.00 | |||
DeepSeek R1 0528 Qwen3 8B | DeepSeek | 128k | 52 | $0.07 | 91.7 | 0.70 |
DeepSeek R1 Distill Qwen 32B | DeepSeek | 128k | 52 | $0.30 | 40.6 | 1.25 |
Qwen3 8B (Reasoning) | Alibaba | 128k | 51 | $0.66 | 98.8 | 0.96 |
Llama 3.3 Nemotron Super 49B Reasoning | NVIDIA | 128k | 51 | $0.00 | ||
Solar Pro 2 (Reasoning) | Upstage | 64k | 51 | $0.00 | 107.4 | 1.70 |
Grok 3 | xAI | 1m | 51 | $6.00 | 86.1 | 0.55 |
Llama 4 Maverick | Meta | 1m | 51 | $0.39 | 163.4 | 0.35 |
GPT-4o (March 2025) | OpenAI | 128k | 50 | $7.50 | 193.2 | 0.43 |
Gemini 2.0 Pro Experimental | 2m | 49 | $0.00 | 34.1 | 18.77 | |
DeepSeek R1 Distill Qwen 14B | DeepSeek | 128k | 49 | $0.20 | 82.7 | 0.69 |
Mistral Medium 3 | Mistral | 128k | 49 | $0.80 | 77.2 | 0.43 |
Sonar Reasoning | Perplexity | 127k | 49 | $2.00 | 79.7 | 1.48 |
Gemini 2.5 Flash (April '25) | 1m | 49 | $0.26 | 322.0 | 0.41 | |
DeepSeek R1 Distill Llama 70B | DeepSeek | 128k | 48 | $0.80 | 70.6 | 0.51 |
Claude 3.7 Sonnet | Anthropic | 200k | 48 | $6.00 | 81.4 | 1.35 |
Gemini 2.0 Flash | 1m | 48 | $0.17 | 224.2 | 0.39 | |
Qwen3 4B (Reasoning) | Alibaba | 32k | 47 | $0.40 | 104.9 | 1.08 |
Reka Flash 3 | Reka AI | 128k | 47 | $0.35 | 55.9 | 1.22 |
Qwen3 235B | Alibaba | 128k | 47 | $1.23 | 44.2 | 1.16 |
Gemini 2.0 Flash (exp) | 1m | 46 | $0.00 | 208.8 | 0.29 | |
Gemini 2.5 Flash-Lite | 1m | 46 | $0.17 | 417.5 | 0.25 | |
DeepSeek V3 (Dec '24) | DeepSeek | 128k | 46 | $0.48 | ||
Qwen2.5 Max | Alibaba | 32k | 45 | $2.80 | 40.2 | 1.35 |
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | NVIDIA | 128k | 45 | $0.00 | ||
Solar Pro 2 | Upstage | 64k | 45 | $0.00 | 113.9 | 1.86 |
Gemini 1.5 Pro (Sep) | 2m | 45 | $2.19 | 91.9 | 0.90 | |
Claude 3.5 Sonnet (Oct) | Anthropic | 200k | 44 | $6.00 | 79.8 | 1.68 |
Qwen3 32B | Alibaba | 128k | 44 | $1.23 | 61.9 | 1.20 |
Sonar | Perplexity | 127k | 43 | $1.00 | 93.1 | 1.82 |
Llama 4 Scout | Meta | 10m | 43 | $0.26 | 122.8 | 0.37 |
Sonar Pro | Perplexity | 200k | 43 | $6.00 | 160.7 | 1.70 |
QwQ 32B-Preview | Alibaba | 33k | 43 | $0.67 | 51.4 | 0.46 |
Nova Premier | Amazon | 1m | 43 | $5.00 | 83.7 | 0.97 |
Qwen3 30B A3B | Alibaba | 128k | 43 | $0.35 | 88.5 | 1.02 |
Mistral Small 3.2 | Mistral | 128k | 42 | $0.15 | 202.2 | 0.31 |
GPT-4o (Nov '24) | OpenAI | 128k | 41 | $4.38 | 139.3 | 0.42 |
Gemini 2.0 Flash-Lite (Feb '25) | 1m | 41 | $0.13 | 205.2 | 0.32 | |
Llama 3.3 70B | Meta | 128k | 41 | $0.60 | 85.7 | 0.42 |
GPT-4.1 nano | OpenAI | 1m | 41 | $0.17 | 142.9 | 0.34 |
Qwen3 14B | Alibaba | 128k | 41 | $0.61 | 66.5 | 1.10 |
GPT-4o (May '24) | OpenAI | 128k | 41 | $7.50 | 74.6 | 0.70 |
Gemini 2.0 Flash-Lite (Preview) | 1m | 41 | $0.13 | 205.7 | 0.33 | |
GPT-4o (Aug '24) | OpenAI | 128k | 41 | $4.38 | 69.6 | 0.59 |
Llama 3.1 405B | Meta | 128k | 40 | $3.25 | 32.8 | 0.63 |
Qwen2.5 72B | Alibaba | 131k | 40 | $0.00 | 58.1 | 1.17 |
MiniMax-Text-01 | MiniMax | 4m | 40 | $0.42 | ||
Phi-4 | Microsoft Azure | 16k | 40 | $0.22 | 23.9 | 0.46 |
Claude 3.5 Sonnet (June) | Anthropic | 200k | 40 | $6.00 | 80.2 | 0.93 |
Command A | Cohere | 256k | 40 | $4.38 | 172.1 | 0.21 |
Tulu3 405B | Allen Institute for AI | 128k | 40 | $0.00 | ||
GPT-4o (ChatGPT) | OpenAI | 128k | 40 | $7.50 | ||
Llama 3.3 Nemotron Super 49B v1 | NVIDIA | 128k | 39 | $0.00 | ||
Grok 2 | xAI | 131k | 39 | $0.00 | ||
Gemini 1.5 Flash (Sep) | 1m | 39 | $0.13 | 176.3 | 0.25 | |
GPT-4 Turbo | OpenAI | 128k | 39 | $15.00 | 39.9 | 0.74 |
Mistral Large 2 (Nov '24) | Mistral | 128k | 38 | $3.00 | 29.4 | 0.57 |
Qwen3 1.7B (Reasoning) | Alibaba | 32k | 38 | $0.40 | 138.1 | 0.94 |
Gemma 3 27B | 128k | 38 | $0.00 | 21.7 | 0.74 | |
Grok Beta | xAI | 128k | 38 | $7.50 | ||
Pixtral Large | Mistral | 128k | 37 | $3.00 | 90.8 | 0.40 |
Qwen2.5 Instruct 32B | Alibaba | 128k | 37 | $0.15 | ||
Llama 3.1 Nemotron 70B | NVIDIA | 128k | 37 | $0.17 | 40.6 | 0.26 |
Nova Pro | Amazon | 300k | 37 | $1.40 | ||
Qwen3 8B | Alibaba | 128k | 37 | $0.31 | 100.0 | 1.13 |
Mistral Large 2 (Jul '24) | Mistral | 128k | 37 | $3.00 | 103.2 | 0.45 |
Qwen2.5 Coder 32B | Alibaba | 131k | 36 | $0.15 | 52.0 | 0.42 |
GPT-4 | OpenAI | 8k | 36 | $37.50 | 25.5 | 0.76 |
GPT-4o mini | OpenAI | 128k | 36 | $0.26 | 65.6 | 0.46 |
Llama 3.1 70B | Meta | 128k | 35 | $0.76 | 53.6 | 0.41 |
Mistral Small 3.1 | Mistral | 128k | 35 | $0.15 | 174.8 | 0.28 |
Mistral Small 3 | Mistral | 32k | 35 | $0.15 | 137.6 | 0.32 |
DeepSeek-V2.5 (Dec '24) | DeepSeek | 128k | 35 | $0.17 | ||
Qwen3 4B | Alibaba | 32k | 35 | $0.19 | 106.4 | 1.08 |
Claude 3 Opus | Anthropic | 200k | 35 | $30.00 | 28.7 | 1.32 |
Claude 3.5 Haiku | Anthropic | 200k | 35 | $1.60 | 66.8 | 1.43 |
Gemini 2.0 Flash Thinking exp. (Dec '24) | 2m | 35 | $0.00 | |||
DeepSeek-V2.5 | DeepSeek | 128k | 35 | $0.17 | ||
Devstral | Mistral | 256k | 34 | $0.15 | 138.2 | 0.36 |
Mistral Saba | Mistral | 32k | 34 | $0.30 | 96.2 | 0.31 |
DeepSeek R1 Distill Llama 8B | DeepSeek | 128k | 34 | $0.04 | 56.3 | 0.82 |
Reka Core | Reka AI | 128k | 34 | $2.00 | 50.2 | 1.31 |
Gemma 3 12B | 128k | 34 | $0.06 | |||
Gemini 1.5 Pro (May) | 2m | 34 | $2.19 | |||
R1 1776 | Perplexity | 128k | 34 | $3.50 | ||
Qwen2.5 Turbo | Alibaba | 1m | 34 | $0.09 | 89.7 | 1.03 |
Reka Flash | Reka AI | 128k | 34 | $0.35 | 85.5 | 1.20 |
Llama 3.2 90B (Vision) | Meta | 128k | 33 | $0.54 | 33.9 | 0.37 |
Solar Mini | Upstage | 4k | 33 | $0.15 | 93.9 | 1.10 |
Reka Flash (Feb '24) | Reka AI | 128k | 33 | $0.35 | 85.4 | 1.21 |
Reka Edge | Reka AI | 128k | 33 | $0.10 | 85.1 | 1.16 |
Qwen2 72B | Alibaba | 131k | 33 | $0.00 | 31.0 | 1.33 |
Nova Lite | Amazon | 300k | 33 | $0.10 | 208.9 | 0.47 |
Gemma 2 27B | 8k | 32 | $0.80 | |||
Gemini 1.5 Flash-8B | 1m | 31 | $0.07 | 240.6 | 0.25 | |
DeepHermes 3 - Mistral 24B | Nous Research | 32k | 30 | $0.00 | ||
Jamba 1.5 Large | AI21 Labs | 256k | 29 | $3.50 | ||
Hermes 3 - Llama-3.1 70B | Nous Research | 128k | 29 | $0.00 | ||
DeepSeek-Coder-V2 | DeepSeek | 128k | 29 | $0.17 | ||
Jamba 1.6 Large | AI21 Labs | 256k | 29 | $3.50 | 59.3 | 0.71 |
Gemini 1.5 Flash (May) | 1m | 28 | $0.13 | |||
Nova Micro | Amazon | 130k | 28 | $0.06 | 377.1 | 0.43 |
Gemma 3n E4B | 32k | 28 | $0.03 | 52.6 | 0.29 | |
Yi-Large | 01.AI | 32k | 28 | $0.00 | ||
Claude 3 Sonnet | Anthropic | 200k | 28 | $6.00 | 60.6 | 1.26 |
Codestral (Jan '25) | Mistral | 256k | 28 | $0.45 | 167.2 | 0.30 |
Llama 3 70B | Meta | 8k | 27 | $0.84 | 47.1 | 0.41 |
Mistral Small (Sep '24) | Mistral | 33k | 27 | $0.30 | 120.6 | 0.33 |
Gemini 1.0 Ultra | 33k | 27 | $0.00 | |||
Gemma 3n E4B (May '25) | 32k | 27 | $0.00 | |||
Phi-4 Multimodal | Microsoft Azure | 128k | 27 | $0.00 | 21.9 | 0.37 |
Qwen2.5 Coder 7B | Alibaba | 131k | 27 | $0.00 | ||
Mistral Large (Feb '24) | Mistral | 33k | 26 | $6.00 | ||
Jamba Instruct | AI21 Labs | 256k | 26 | $0.00 | ||
Mixtral 8x22B | Mistral | 65k | 26 | $3.00 | 59.2 | 0.38 |
Phi-4 Mini | Microsoft Azure | 128k | 26 | $0.00 | 45.7 | 0.36 |
Gemma 3 4B | 128k | 25 | $0.03 | |||
Llama 3.2 11B (Vision) | Meta | 128k | 25 | $0.10 | 59.5 | 0.45 |
Qwen3 1.7B | Alibaba | 32k | 25 | $0.19 | 141.0 | 1.17 |
Qwen1.5 Chat 110B | Alibaba | 32k | 25 | $0.00 | 23.7 | 1.59 |
Phi-3 Medium 14B | Microsoft Azure | 128k | 25 | $0.30 | 53.0 | 0.43 |
Claude 2.1 | Anthropic | 200k | 24 | $12.00 | 13.9 | 1.09 |
Claude 3 Haiku | Anthropic | 200k | 24 | $0.50 | 145.7 | 1.10 |
Llama 3.1 8B | Meta | 128k | 24 | $0.10 | 205.1 | 0.28 |
Pixtral 12B | Mistral | 128k | 23 | $0.15 | 104.5 | 0.31 |
Qwen3 0.6B (Reasoning) | Alibaba | 32k | 23 | $0.40 | 228.8 | 0.91 |
Claude 2.0 | Anthropic | 100k | 23 | $12.00 | 31.4 | 1.26 |
DeepSeek-V2 | DeepSeek | 128k | 23 | $0.17 | ||
Mistral Small (Feb '24) | Mistral | 33k | 23 | $1.50 | 194.6 | 0.31 |
Mistral Medium | Mistral | 33k | 23 | $4.09 | 76.2 | 0.41 |
GPT-3.5 Turbo | OpenAI | 4k | 23 | $0.75 | 103.9 | 0.39 |
Ministral 8B | Mistral | 128k | 22 | $0.10 | 189.7 | 0.31 |
Gemma 2 9B | 8k | 22 | $0.20 | |||
Phi-3 Mini | Microsoft Azure | 4k | 22 | $0.00 | ||
Arctic | Snowflake | 4k | 22 | $0.00 | ||
Qwen Chat 72B | Alibaba | 34k | 22 | $1.00 | ||
LFM 40B | Liquid AI | 32k | 22 | $0.15 | 160.2 | 0.16 |
Command-R+ | Cohere | 128k | 21 | $4.38 | 49.1 | 0.27 |
Llama 3 8B | Meta | 8k | 21 | $0.07 | 92.1 | 0.38 |
PALM-2 | 8k | 21 | $0.00 | |||
Gemini 1.0 Pro | 33k | 21 | $0.75 | |||
DeepSeek Coder V2 Lite | DeepSeek | 128k | 20 | $0.00 | ||
Codestral (May '24) | Mistral | 33k | 20 | $0.30 | ||
Aya Expanse 32B | Cohere | 128k | 20 | $0.75 | 120.6 | 0.17 |
Llama 2 Chat 70B | Meta | 4k | 20 | $0.00 | ||
DeepSeek LLM 67B (V1) | DeepSeek | 4k | 20 | $0.00 | ||
Llama 2 Chat 13B | Meta | 4k | 20 | $0.00 | ||
Command-R+ (Apr '24) | Cohere | 128k | 20 | $6.00 | 58.4 | 0.22 |
OpenChat 3.5 | OpenChat | 8k | 20 | $0.00 | ||
DBRX | Databricks | 33k | 20 | $0.00 | ||
Ministral 3B | Mistral | 128k | 20 | $0.04 | 263.5 | 0.29 |
Mistral NeMo | Mistral | 128k | 20 | $0.15 | 174.9 | 0.30 |
Llama 3.2 3B | Meta | 128k | 20 | $0.04 | 104.6 | 0.51 |
DeepSeek R1 Distill Qwen 1.5B | DeepSeek | 128k | 19 | $0.00 | ||
Jamba 1.5 Mini | AI21 Labs | 256k | 18 | $0.25 | ||
Jamba 1.6 Mini | AI21 Labs | 256k | 18 | $0.25 | 168.3 | 0.54 |
Mixtral 8x7B | Mistral | 33k | 17 | $0.70 | 61.7 | 0.37 |
Qwen3 0.6B | Alibaba | 32k | 17 | $0.19 | 232.2 | 0.90 |
DeepHermes 3 - Llama-3.1 8B | Nous Research | 128k | 16 | $0.00 | ||
Aya Expanse 8B | Cohere | 8k | 16 | $0.75 | 167.0 | 0.27 |
Command-R | Cohere | 128k | 15 | $0.26 | 75.7 | 0.19 |
Command-R (Mar '24) | Cohere | 128k | 15 | $0.75 | 175.9 | 0.15 |
Qwen Chat 14B | Alibaba | 8k | 14 | $0.00 | ||
Claude Instant | Anthropic | 100k | 14 | $1.20 | 66.1 | 0.55 |
Codestral-Mamba | Mistral | 256k | 14 | $0.00 | ||
Gemma 3 1B | 32k | 13 | $0.00 | |||
Llama 65B | Meta | 2k | 11 | $0.00 | ||
Mistral 7B | Mistral | 8k | 10 | $0.25 | 121.6 | 0.31 |
Llama 3.2 1B | Meta | 128k | 10 | $0.05 | 167.7 | 0.39 |
Llama 2 Chat 7B | Meta | 4k | 8 | $0.10 | 132.9 | 0.53 |
GPT-4o Realtime (Dec '24) | OpenAI | 128k | $0.00 | |||
GPT-4o mini Realtime (Dec '24) | OpenAI | 128k | $0.00 | |||
Sonar Reasoning Pro | Perplexity | 127k | $0.00 | |||
Grok 3 mini Reasoning (low) | xAI | 1m | $0.35 | 197.6 | 0.40 | |
GPT-3.5 Turbo (0613) | OpenAI | 4k | $0.00 |
По данным: artificialanalysis.ai
0 комментариев