Latest AI news, expert analysis, bold opinions, and key trends — delivered to your inbox.
Elon Musk’s xAI just dropped Grok 4 and Grok 4 Heavy, its next-gen AI models designed for pure reasoning power—and they’re already boasting state-of-the-art (SOTA) performance on elite benchmarks like Arc-AGI and Humanity’s Last Exam.
Grok 4 is a single-agent model equipped with voice, vision, and a massive 128K context window.
Grok 4 Heavy takes it a step further, using multi-agent collaboration to handle more complex reasoning and tasks.
Both models deliver SOTA results on:
Humanity’s Last Exam
Arc-AGI-2
AIME (math-focused benchmark)
They also outperform Gemini 2.5 Pro and OpenAI’s o3, putting xAI in direct competition with the biggest players in the space.
Grok 4 is part of the SuperGrok plan at $30/month.
Grok 4 Heavy is bundled with the SuperGrok Heavy plan, priced at $300/month.
For developers, the API includes a 256K-token context window and built-in search, priced at $3/million input tokens and $15/million output tokens.
This launch follows a wave of criticism after Grok 3 made headlines for racist and antisemitic outputs post-update. xAI claims this release is built on tighter alignment, but it’s clear the company will face intense scrutiny going forward.
xAI might be new, but it’s not playing small. Grok 4 and 4 Heavy showcase the raw power of Musk’s Colossus supercomputer and raise the bar for scaling frontier LLMs. But with controversy still fresh, the bigger challenge might not be benchmarks—it’s trust.