This new text-to-speech AI model understands what it’s saying – how to try it for free

Must Read
bicycledays
bicycledayshttp://trendster.net
Please note: Most, if not all, of the articles published at this website were completed by Chat GPT (chat.openai.com) and/or copied and possibly remixed from other websites or Feedzy or WPeMatico or RSS Aggregrator or WP RSS Aggregrator. No copyright infringement is intended. If there are any copyright issues, please contact: bicycledays@yahoo.com.

Textual content-to-speech AI fashions are an excellent instrument for cases the place human voice actors are sometimes used, akin to audiobooks, dubbing, commercials, and extra. Nonetheless, as a result of these fashions should not human and unaware of what they are saying, they will typically sound noticeably robotic. Hume’s new AI mannequin seeks to sort out this challenge. 

Octave

On Wednesday, Hume launched Octave, a text-to-speech massive language mannequin (LLM) with contextual consciousness. The LLM can use this consciousness to regulate its tune, rhythm, and timbre of speech to the phrases it’s studying based mostly on their that means, in keeping with the corporate. For instance, an AI-enabled voice can convey a way of disgust when studying a sentence.

Past understanding the context of the textual content, the mannequin can even take instructions. Customers can instruct it to be “calm”, “whispering”, “disgustful”, “indignant”, and extra. Hume says the benefit Octave has over a voice actor is that it might tackle any voice and even invent a brand new one based mostly on the consumer description. 

As an illustration, Hume says a consumer might present a immediate so simple as “smart wizard” or as complicated as combining totally different accents, demographic teams, occupational roles, and extra. Basically, the mannequin would invent a voice on the script alone, however when prompted, it may very well be steered by the script and the outline. 

Testing the mannequin

The consumer interface is simple to navigate, with one textual content field for Voice, in which you’ll describe precisely what you need the voice to sound like, and one other for Script, during which you enter what you need the mannequin to say. For my first check, I used the detailed pre-made prompts to see the way it sounded. 

After clicking on “Generate”, Octave generated three voice outcomes, and upon first hear I used to be impressed. Though I wasn’t satisfied that the generations captured the “valley lady” sound, I used to be super-impressed with the intonations and inflections. 

For my immediate, I created a state of affairs the place the first speaker is out of breath from operating and in a rush. The script learn: “YAY I’m nearly on the end line. I’m so drained however am going to maintain pushing as a result of I’m nearly there. So long! Byeeee.” 

I used to be equally pleased with these outcomes. Octave largely conveyed what I wished, inserting the correct quantity of pleasure and pauses the place breaths can be taken should you have been exhausted from operating. Nonetheless, just like the prior instance, the voice wasn’t precisely what I described. On this case, the speaker did not converse super-fast. 

General, it looks like the mannequin’s power is inserting the nuances of human speech in its output. What typically offers AI voices away is their monotony, making the output sound fairly boring to take heed to. With Octave, you could possibly hear the reader’s feelings, whether or not frustration, defeat, or tiredness. Phrases like “ugh” have the precise size and respiratory a human would use, creating a fascinating expertise. 

Find out how to entry

There are totally different tiers for accessing the mannequin, together with a free one with a ten,000-character restrict (round 10 minutes) and limitless character voices if you wish to strive it out. Past the free tier, there are six further tiers, starting from $3 to $900 monthly, relying on entry wants.  

For instance, the Starter tier is $3 monthly and contains 30,000 characters (round half-hour), whereas the Enterprise tier is $900 month-to-month for 10,000,000 characters (round 10,000 minutes). There’s additionally an Enterprise possibility that may be custom-made to your wants. You possibly can view all of the choices and get began on the Hume web site.

Latest Articles

How AI Agents Are Reshaping Security and Fraud Detection in the...

Fraud and cybersecurity threats are escalating at an alarming fee. Companies lose an estimated 5% of their annual income...

More Articles Like This