Musk claims new Grok 4 beats o3 and Gemini 2.5 Pro – how to try it

Elon Musk’s AI startup xAI unveiled Grok 4 early Thursday morning, describing it as “the world’s strongest AI mannequin.”

Throughout an hour-long livestream hosted on X, the social media platform additionally owned by Musk, the CEO claimed that the most recent iteration of his AI firm’s flagship AI mannequin surpassed competing chatbots on a number of key benchmarks. The multimodal AI agent has imaginative and prescient and voice capabilities in addition to a 128k context window.

He touted Grok 4 because the world’s best-performing mannequin on Humanity’s Final Examination (HLE), an AI testing benchmark comprising a sequence of inauspicious issues throughout math, science, and the humanities. HLE has been framed as a extra dependable take a look at of a mannequin’s capabilities since its launch in January, as a result of concern of benchmark saturation, or benchmarks changing into too simple for the way quicky fashions are evolving.

By xAI’s personal reporting, Grok 4 beat OpenIA’s o3 and Google’s Gemini 2.5 Professional on HLE. “Grok 4 is best than PhD stage in each topic,” Musk stated in the course of the livestream. “No exceptions.”

xAI has not but revealed a analysis paper outlining Grok 4’s efficiency on key AI efficiency benchmarks, a observe that has develop into commonplace when main AI builders launch a brand new mannequin. The corporate has not replied to ZDNET’s request for remark on the time of this writing.

That stated, impartial AI reviewer Synthetic Evaluation confirmed xAI’s claims, stating it had obtained early entry to Grok 4 and that it’s “now the main AI mannequin,” evaluating the corporate’s progress to opponents in a chart.

Grok 4 is now out there through the xAI app and web site for $30 monthly. Builders can entry the mannequin’s API for $3 per 1 million enter tokens, or $15 per 1 million output tokens. Grok 4 Heavy, a model that leverages a number of AI brokers concurrently to motive by means of significantly tough issues, can also be out there for a $300-per-month subscription. The mannequin’s predecessor, Grok 3, continues to be out there without spending a dime on-line.

Grok’s hate-filled posting spree

The launch arrives shortly after Grok 3 went on an antisemitic tirade on X, the place it has its personal account. In a single publish, it implied that individuals with Jewish final names have been extra more likely to take part in “excessive leftist activism.” In one other, responding to a consumer who referred to campers at Camp Mystic, the Christian summer season camp in Texas the place over two dozen campers and employees members have been just lately killed by lethal floods, as “future fascists,” Grok appeared to endorse Hitlerian genocide to take care of what it described as “such vile anti-white hate.”

“[Hitler would] determine the ‘sample’ in such hate–often tied to sure surnames–and act decisively: spherical them up, strip rights, and eradicate the menace by means of camps and worse,” the chatbot wrote.

Among the posts have been later eliminated by X. The corporate’s CEO, Linda Yaccarino, introduced Wednesday morning — with out a lot rationalization — that she can be stepping down from the position. The identical morning, Musk briefly responded to the Grok fiasco on X, writing that the mannequin “was too compliant to consumer prompts. Too desperate to please and be manipulated, primarily.” The problem, he added, “is being addressed.”

He conspicuously averted any point out of his chatbot’s social media tirade in the course of the Thursday livestream. He did, nonetheless, say he believed that it was essential for AI to be “maximally truth-seeking.”

Musk based xAI in 2023 “to grasp the universe,” in keeping with the corporate’s mission assertion on its web site. He has positioned Grok as a substitute for AI chatbots from corporations like Google and OpenAI, which Musk has ridiculed as being too “woke” and politically right. Grok, in distinction, was constructed to be blunt and humorous in its responses to consumer queries.