Fable 5 just set a new AI freelance work performance record – but it can’t replace humans yet

Comply with ZDNET: Add us as a most well-liked supply on Google.

ZDNET’s key takeaways

Fable 5 accelerates AI’s success fee on distant duties to 16%.
AI capabilities stay everywhere in the map.
Nonetheless, agent abilities have “quadrupled in below eight months,” mentioned CAIS.

After a short hiatus, Anthropic’s lauded Fable 5 mannequin is again, and it is resetting the bar for automating work.

The US authorities re-authorized the mannequin — which Anthropic mentioned shares functionality similarities with Mythos 5, nonetheless solely out there for choose organizations’ use — on June 30. However earlier than it was pulled, the Heart for AI Security (CAIS) examined Fable 5 on its Distant Labor Index (RLI), launched in October 2025. It blew Anthropic’s Opus 4.8 and OpenAI’s GPT-5.5, every comparatively new and thought of spectacular, out of the water.

RLI measures “how usually AI brokers can full actual, economically beneficial freelance initiatives […] at a high quality a paying shopper would really settle for,” CAIS defined within the research. These can embody computer-assisted and graphic design, knowledge evaluation, video work, and extra. As in different related human capability exams, every deliverable the fashions create is evaluated by people towards an expert normal deliverable. The ensuing automation fee displays the distribution of initiatives the place evaluators discovered what the AI produced to be pretty much as good as or higher than human skilled work.

CAIS requested Fable 5, GPT-5.5, and Opus 4.8 to design a 3D mockup of an engagement ring, create a video advert, and map a ground plan, amongst different exams. Researchers gave every mannequin human-generated enter information to get began, equally to the way you’d prep a human freelancer with related paperwork and knowledge for a job.

Fable 5 hit an automation fee of 16.1%, a document for the benchmark — and double Opus 4.8, which scored 8.3%. GPT‑5.5 got here in third at 6.3%, however CAIS famous that each one three fashions scored increased than each mannequin it is evaluated so far.

“For context, the earlier printed chief sat at 4.17% (Opus 4.6 with the Claude Cowork scaffold), and the sphere topped out at 2.5% when RLI was launched,” CAIS mentioned. “The frontier has greater than quadrupled in below eight months, a concrete sign of how shortly economically succesful AI brokers are advancing.”

CAIS famous that its testing was minimize quick by the federal government shutting down Fable 5 in mid-June, however that even these partial outcomes set the mannequin aside.

“Even below the worst-case assumption that Fable 5 failed each lacking venture, its automation fee would nonetheless be 14.6%, increased than every other mannequin,” the researchers mentioned.

What this implies for freelancers

Whereas the speed of AI mannequin acceleration is critical in only a few months, that does not robotically translate to freelance job alternative or loss throughout the board. Sixteen % is not wherever near 100% but. Past that, regardless of demonstrable good points, AI is not a flawlessly interesting clear up for each group; safety issues and different adoption roadblocks usually make integrating AI instruments sluggish, multi-step processes for many corporations, a minimum of to start out. With the intention to absolutely exchange human freelancers, organizations would probably want a community of brokers to verify components like work high quality, finances, and timeline; the tradeoff is not one-to-one.

CAIS tried to interchange the human evaluator with an “LLM decide,” ostensibly to see how far-off from human-in-the-loop this experiment might fairly get, however the mannequin failed.

“Evaluating an RLI deliverable is itself a demanding, agentic activity,” CAIS defined. “Doing it correctly means opening the venture’s information in the correct skilled purposes, working these purposes competently, and forming a judgment the best way a shopper would, the very computer-use abilities that in the present day’s brokers are nonetheless weakest at.”

That mentioned, bettering skills might shrink some freelance alternatives for particular corporations already efficiently integrating AI. As well as, if computer-use abilities are the present limitation and poised to enhance based mostly on the business’s funding in more and more agentic fashions, that roadblock might ultimately disappear. On the fee fashions have been bettering on different benchmarks that measure agentic ability, which will arrive earlier than we are able to think about.

Talking of time: CAIS additionally discovered that when a activity takes longer for a human, that does not essentially imply it will likely be tougher for AI to finish. That point-horizon evaluation holds true for coding, for instance, however not the broader array of distant duties RLI measures for. Proper now, it is arduous to attract conclusions from that for the longer term.

“Some work that’s fast for a talented skilled stays out of attain [for AI], similar to transcribing music or playtesting a real-time recreation, whereas different work that might take an individual hours, similar to digital artwork or coding, is completed by present fashions in minutes,” CAIS wrote.