Text-to-speech with feeling – this new AI model does everything but shed a tear

Not so way back, generative AI may solely talk with human customers through textual content. Now it is more and more being given the facility of speech — and this means is bettering by the day.

On Thursday, AI voice platform ElevenLabs launched v3, described on the corporate’s web site as “probably the most expressive text-to-speech mannequin ever.” The brand new mannequin can exhibit a variety of feelings and refined communicative quirks — like sighs, laughter, and whispering — making its speech extra humanlike than the corporate’s earlier fashions.

In a demo shared on X, v3 was proven producing the voices of two characters, one male and the opposite feminine, who have been having a lighthearted dialog about their newfound means to talk in additional humanlike voices.

Introducing Eleven v3 (alpha) – probably the most expressive Textual content to Speech mannequin ever.
Supporting 70+ languages, multi-speaker dialogue, and audio tags similar to [excited], [sighs], [laughing], and [whispers].
Now in public alpha and 80% off in June. pic.twitter.com/n56BersdUc

— ElevenLabs (@elevenlabsio) June 5, 2025

There is definitely not one of the Alexa-esque flatness of tone, however the v3-generated voices are usually virtually excessively animated, to the purpose that their laughter is extra creepy than charming — take a pay attention your self.

The mannequin can even communicate greater than 70 languages, in comparison with its predecessor’s v2 restrict of 29. It is accessible now in public alpha, and its price ticket has been slashed by 80% till the top of this month.

The way forward for AI interplay

AI-generated voice has grow to be a significant focus of innovation as tech builders look towards the way forward for human-machine interplay.

Automated assistants like Siri and Alexa have lengthy been in a position to communicate, after all, however as anybody who routinely makes use of these programs can attest, their voices are very mechanical, with a somewhat slim vary of emotional cadence and tones. They’re helpful for dealing with fast and simple duties, like taking part in a tune or setting an alarm, however they do not make nice dialog companions.

A number of the newest text-to-speech (TTS) AI instruments, however, have been engineered to talk in voices which might be maximally real looking and fascinating.

Customers can immediate v3, for instance, to talk in voices which might be simply customizable by way of using “audio tags.” Consider these as stylistic filters that modify the output, and which could be inserted immediately into textual content prompts: “Excited,” “Loudly,” “Sings,” “Laughing,” “Offended,” and so forth.

ElevenLabs is not the one firm racing to construct extra lifelike TTS fashions, which large tech corporations are promoting as a extra intuitive and accessible approach to work together with AI.

In late Could, ElevenLabs competitor Hume AI unveiled its Empathic Voice Interface (EVI) 3 mannequin, which permits customers to generate customized voices by describing them in pure language. Equally nuanced conversational talents are additionally now on provide by way of Google’s Gemini 2.5 Professional Flash mannequin.

Need extra tales about AI? Join Innovation, our weekly e-newsletter.