Is Perplexity’s Sonar really more ‘factual’ than its AI rivals? See for yourself

AI search engine Perplexity says its newest launch goes above and past for consumer satisfaction — particularly in comparison with OpenAI’s GPT-4o.

On Tuesday, Perplexity introduced a brand new model of Sonar, its proprietary mannequin. Based mostly on Meta’s open-source Llama 3.3 70B, the up to date Sonar “is optimized for reply high quality and consumer expertise,” the corporate says, having been educated to enhance the readability and accuracy of its solutions in search mode.

Perplexity claims Sonar scored larger than GPT-4o mini and Claude fashions on factuality and readability. The corporate defines factuality as a measure of “how nicely a mannequin can reply questions utilizing information which can be grounded in search outcomes, and its capacity to resolve conflicting or lacking info.” Nevertheless, there is not an exterior benchmark to measure this.

As a substitute, Perplexity shows a number of screenshot examples of side-by-side solutions from Sonar and competitor fashions together with GPT-4o and Claude 3.5 Sonnet. They do, in my view, differ in directness, completion, and scannability, typically favoring Sonar’s cleaner formatting (a subjective choice) and better variety of citations — although that does not converse on to supply high quality, solely amount. The sources a chatbot cites are additionally influenced by the writer and media companion agreements of its dad or mum firm, which Perplexity and OpenAI every have.

Extra importantly, the examples do not embrace the queries themselves, solely the solutions, and Perplexity doesn’t make clear a strategy on the way it provoked or measured the responses — variations between queries, the variety of queries run, and so forth. — as a substitute leaving the comparisons as much as people to “see the distinction.” ZDNET has reached out to Perplexity for remark.

Perplexity says that on-line A/B testing revealed that customers have been rather more glad and engaged with Sonar than with GPT-4o mini, Claude 3.5 Haiku, and Claude 3.5 Sonnet, however it did not broaden on the specifics of those outcomes.

“Sonar considerably outperforms fashions in its class, like GPT-4o mini and Claude 3.5 Haiku, whereas carefully matching or exceeding the efficiency of frontier fashions like GPT-4o and Claude 3.5 Sonnet for consumer satisfaction,” Perplexity’s announcement states.

In accordance with Perplexity, Sonar’s velocity of 1,200 tokens per second allows it to reply queries virtually immediately and work 10 instances sooner than Gemini 2.0 Flash. Testing confirmed Sonar surpassing GPT-4o mini and Claude 3.5 Haiku “by a considerable margin,” however the firm does not make clear the small print of that testing. The corporate additionally says Sonar outperforms dearer frontier fashions like Claude 3.5 Sonnet “whereas carefully approaching the efficiency of GPT-4o.”

Sonar did beat its two opponents, amongst others, on tutorial benchmark assessments IFEval and MMLU, which consider how nicely a mannequin follows consumer directions and its grasp of “world data” throughout disciplines.

Wish to attempt it for your self? The upgraded Sonar is offered for all Professional customers, who could make it their default mannequin of their settings or entry it by the Sonar API.