AI Model Release Tracker: Opus 4.8’s misalignment rates similar to Claude Mythos Preview

Observe ZDNET: Add us as a most well-liked supply on Google.

AI labs are transport new fashions nonstop. Moreover being higher and sooner than their predecessors, nevertheless, each new mannequin is not assured to be a serious step change, regardless of how the corporate’s PR might wax poetic about them. Mannequin strengths actually emerge in context: The place are competitor fashions missing or excelling? Which fashions have excellent specialties, and that are simply catching as much as business requirements?

Our Mannequin Launch Tracker helps you make sense of the place fashions stand relative to one another, and whether or not they’re value a deeper look. Whereas we do not check each mannequin or mannequin replace on this record, we’ll at all times embody the important thing parts it’s worthwhile to know, together with our hands-on skilled check, the place relevant. We additionally embody an Knowledgeable Rating for sure fashions. Interested by how we check AI? Take a look at this breakdown of our course of.

Listed below are among the greatest mannequin releases of 2026 thus far and what to find out about them. We’ll replace this record every time a notable new mannequin arrives.

Claude Opus 4.8

Anthropic | Might 28, 2026

What it does: Changing Opus 4.7 beginning as we speak (on the identical value), Opus 4.8 presents sooner pondering modes for one-third the price of the sooner model, based on Anthropic. Like most of Anthropic’s fashions, 4.8 prioritizes coding talents, scoring greater than 4.7 on two coding benchmarks however not totally besting OpenAI’s GPT 5.5. It additionally “reaches new highs on our measures of prosocial traits like supporting person autonomy and appearing within the person’s greatest curiosity,” the corporate famous within the launch, although definitions for what meaning stay murky.

Why it issues: Anthropic has at all times prioritized mannequin security and interpretability, however seems to be additional emphasizing that normal with this launch. The corporate stated Opus 4.7 had a 92% honesty fee, along with being much less sycophantic and hallucination-prone general. The truth that it claims 4.8 exhibits “considerably” decrease charges of misalignment than 4.7 signifies an more and more excessive normal for mannequin security, particularly as a result of Anthropic in contrast 4.8’s alignment to that of Mythos Preview.

GPT-5.5 Prompt

OpenAI | Might 5, 2026

What it does: OpenAI stated in its announcement that the lighter model of OpenAI’s just-released GPT-5.5 is much less verbose than its predecessor, GPT-5.3 Prompt. It additionally touted fewer hallucinations and improved factuality, saying “GPT‑5.5 Prompt produced 52.5% fewer hallucinated claims than GPT‑5.3 Prompt on high-stakes prompts masking areas like medication, regulation, and finance.”

Why it issues: GPT-5.5 Prompt replaces GPT-5.3 because the default mannequin in ChatGPT. Once more, whereas the expectation is that every new AI mannequin will get extra environment friendly, simpler to make use of, and makes up much less stuff , a major enchancment in hallucinations for a mannequin most individuals use for quick queries may imply much less misinformation spreading among the many lots. That is particularly crucial given how many individuals are utilizing ChatGPT for on a regular basis well being questions, for instance.

(Disclosure: Ziff Davis, ZDNET’s guardian firm, filed an April 2025 lawsuit in opposition to OpenAI, alleging it infringed Ziff Davis copyrights in coaching and working its AI methods.)

Nemotron 3 Nano Omni

Nvidia | April 28, 2026

What it does: The newest in Nvidia’s open Nemotron household, this mannequin supplies brokers with multimodal enter. Which means they will “understand and purpose throughout visible, audio, and textual inputs inside a single shared notion‑to‑motion loop,” based on Nvidia, thereby unifying a number of capabilities right into a single system.

Why it issues: Usually, methods of brokers want to make use of separate fashions for speech, imaginative and prescient, and textual content, which means they soar throughout paperwork, video, and audio to finish multi-step duties. That slows down workflows, undermines the context brokers collect, and racks up inference prices. Nvidia’s strategy, if it really works, would streamline this course of and scale back token use, saving you cash. Strive it on Hugging Face.

GPT-5.5

OpenAI | April 23, 2026

Knowledgeable Rating: 93/100

What it does: ZDNET tester-in-residence David Gewirtz technically gave GPT-5.5 an A- rating, however stated it “will be reductively described as higher and sooner than GPT-5.4,” which is hopefully the bare-minimum expectation for a brand new mannequin. Particularly, although, the mannequin obtained higher at agentic coding, clearly figuring out ideas, scientific analysis, and factual accuracy.

Why it issues: Whereas the mannequin itself is probably not leaps and bounds forward of its rapid predecessor, the short turnaround from 5.4 to five.4 — lower than two months — signifies how quickly agentic coding is accelerating OpenAI’s mannequin launch cycle. As David Gewirtz breaks down, the corporate, very similar to different frontier labs utilizing AI to construct AI, is transport updates at an exponentially rising fee.

ChatGPT Photographs 2

OpenAI | April 23, 2026

What it does: Quickly after sunsetting Sora, its generative video mannequin and social platform, OpenAI considerably confusingly introduced Photographs 2. ZDNET mannequin tester David Gewirtz obtained an early take a look at Photographs 2 earlier than its launch and was impressed. Whereas he did not give this mannequin a proper Knowledgeable Rating, he stated it is enjoyable, an enormous leap, and really helpful for work.

Why it issues: OpenAI gave the impression to be getting out of the extra consumer-minded AI product sport when it discontinued Sora, having been overwhelmed by Anthropic at securing profitable enterprise contracts. That OpenAI nonetheless got here out with Photographs 2 inside that redirection narrative signifies that it sees picture mills as related sufficient to enterprise AI — particularly on the heels of Anthropic’s Claude Design.

Claude Opus 4.7

Anthropic | April 16, 2026

What it does: Arriving comparatively shortly after Opus 4.6, this mannequin boasts new highs in honesty, diminished sycophancy and hallucinations. It additionally seems to have a knack for cybersecurity, because it backs the brand new Claude Safety, launched shortly after the mannequin itself — however no, it is not Mythos, as many suspected.

Why it issues: Hallucinations and honesty are among the many most tough, hard-to-solve points plaguing even the perfect fashions. For Anthropic to assert such vital positive aspects in these areas is not any small feat for an AI lab that takes security critically.

Claude Mythos (Preview)

Anthropic | April 7, 2026

What it does: It is a powerful one as a result of Mythos is not really obtainable to the general public. Anthropic created fairly a media storm when it positioned the brand new general-purpose mannequin as too highly effective to launch as ordinary. Whereas the mannequin is outwardly a step change from earlier Anthropic fashions, the corporate was particularly alarmed due to the safety risk it posed, stating that “it’s strikingly succesful at pc safety duties.”

In response to that, Anthropic spearheaded Venture Glasswing, a collaborative effort with a number of rival AI labs, together with Google, Nvidia, and Microsoft, in addition to safety authorities like Palo Alto Networks, “to assist safe the world’s most crucial software program, and to arrange the business for the practices all of us might want to undertake to maintain forward of cyberattackers.”

Why it issues: If we’re to consider Anthropic’s steering that Mythos poses a major risk to the world’s software program — a lot in order that solely a choose few companions can entry it — cybersecurity apparatuses as they stand is probably not ready to fulfill the quickly evolving frontier of mannequin capabilities. Mythos is probably not the one mannequin of its caliber, however merely the primary of many to return as soon as different labs obtain comparable breakthroughs.

For now, just some weeks into its launch, Mythos helps catch software program bugs in droves.

GPT-5.4

OpenAI | March 5, 2026

What it does: OpenAI framed this new mannequin, launched barely three months after GPT-5.2, as particularly designed for skilled work. In response to the corporate’s personal testing (which ought to at all times be taken with a grain of salt till verified by a 3rd get together), GPT-5.4 matches or outperforms human professionals 83% of the time.

Why it issues: As AI firms focus extra on gaining enterprise belief (and contracts) whereas lauding what agentic AI can do, they want fashions that may deal with advanced work-related duties with minimal danger, delay, or prohibitively excessive prices. Any mannequin development that exhibits prowess in skilled workflows has a greater likelihood of being taken critically by firms struggling to undertake AI, although nothing ensures seamless integration.

Claude Opus 4.6

Anthropic | Feb. 5, 2026

What it does: This mannequin shortly redefined the usual for autonomous agentic work, particularly for coding. That is no shock given Anthropic’s authority in constructing fashions particularly adept at programming duties. Opus 4.6 additionally demonstrated enchancment in advanced, longer-running duties general.

Why it issues: Opus 4.6’s capability to deal with duties higher by itself means you may reliably offload extra of your workflow to it — one thing agentic choices often battle with.

GPT-5.3-Codex

OpenAI | Feb. 5, 2026

What it does: This new coding mannequin — which OpenAI stated helped construct and debug itself — will be interrupted and redirected mid-task, which, if true, is a big boon for builders utilizing it on advanced or shifting initiatives with tons of trial-and-error. GPT-5.3-Codex additionally boasts run occasions of over a day and a greater grasp on person intent.

Why it issues: OpenAI is making an attempt to catch as much as Anthropic’s lead in agentic coding (and, coincidentally or not, launched 5.3 Codex on the identical day as Anthropic launched Opus 4.6). Whereas ZDNET consultants typically desire Claude Code to different instruments for vibe coding, OpenAI’s rumored shift towards enterprise purchasers and away from enjoyable shopper instruments may ultimately shut that hole.

AI Model Release Tracker: Opus 4.8’s misalignment rates similar to Claude Mythos Preview

Claude Opus 4.8

GPT-5.5 Prompt

Nemotron 3 Nano Omni

GPT-5.5

ChatGPT Photographs 2

Claude Opus 4.7

Claude Mythos (Preview)

GPT-5.4

Claude Opus 4.6

GPT-5.3-Codex

Related Posts:

Chinese cybercrime operation that used AI to scam ‘hundreds of thousands...

I used HyperDroid to make my Android devices look and feel...

SpaceX IPO: Everything you need to know

Are Facebook and Instagram down? What to know about the Meta...

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’...

More Articles Like This

Topics

Stay connected

Legal Pages

Top Tags List

About Us