Final month, I wrote about Mercor’s new benchmark measuring AI brokers’ capabilities on skilled duties like regulation and company evaluation. On the time, the scores have been fairly dismal, with each main lab scoring underneath 25%, so we concluded legal professionals have been secure from AI displacement, a minimum of for now.
However AI capabilities can change quite a bit in a few weeks.
This week’s launch of Anthropic’s Opus 4.6 shook up the leaderboards, with Anthropic’s new mannequin scoring simply shy of 30% in one-shot trials, and a median of 45% when given just a few extra cracks on the downside. Notably, the discharge included a bunch of latest agentic options, together with “agent swarms,” which can have helped with this sort of multistep problem-solving.
Regardless, the rating is a large leap from the earlier state-of-the-art, and an indication that progress on basis fashions isn’t slowing down. Mercor CEO Brendan Foody, who was significantly impressed, mentioned, “leaping from 18.4% to 29.8% in just a few months is insane.”
Thirty % continues to be a great distance from 100%, so it’s not like legal professionals must be apprehensive about getting changed by machines subsequent week. However they need to be quite a bit much less assured than they have been final month!





