comscore

Voicebox is Metas new speech-related AI model

It's not your regular text-to-speech tool, but rather an AI tool that's claimed to help content creators, the visually impaired, and other people converse in foreign languages. 

Published By: Pranav Sawant | Published: Jun 17, 2023, 06:23 PM (IST)

  • whatsapp
  • twitter
  • facebook

Highlights

  • Meta announced its all-new AI model for doing speech-related tasks.
  • Voicebox can do tasks like editing, sampling and stylizing audio.
  • Voicebox will get more features as it gets tested by the company.
  • whatsapp
  • twitter
  • facebook

Generative AI has been the talk of the town lately and every tech firm is trying to do something around it. Meta has now announced a new generative AI model called ‘Voicebox’ for doing several speech-related tasks. news Also Read: Meta AI Adds UPI Lite, Hindi Support, and Deepika Padukone’s Voice to Ray-Ban Glasses in India

It’s not your regular text-to-speech tool, but rather an AI tool that’s claimed to help content creators, the visually impaired, and other people converse in foreign languages. news Also Read: Meta Wants Employees To Build The Metaverse 5X Faster Using AI

Mark Zuckerberg on the Meta Channel said that’s an AI model ‘that can do tasks it wasn’t specifically trained on’. The company believes this AI tool to be a breakthrough in generative AI for speech. news Also Read: Meta’s AI Can Now Dub And Translate Instagram And Facebook Reels In Hindi, Portuguese, And More

“Today, we’re announcing a breakthrough in generative AI for speech. We’ve developed Voicebox, a state of the art AI model that can perform speech generation tasks — like editing, sampling and stylizing — that it wasn’t specifically trained to do through in-context learning,” noted the blog post.

As mentioned above, the AI model can do an array of tasks from editing audio to sampling and stylizing. Following are some of the many things the AI model can do.

– Diverse text-to-speech

– Style transfer

– Content correction

– In-context text-to-speech

– Noise removal

Meta revealed that Voicebox can produce high-quality audio clips and edit pre-ordered audio. This includes removing background noise and preserving the style of the audio.

It is worth noting that the AI model is still in testing and is said to do a lot more in the future.

“In the future, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player characters in the metaverse. They could allow visually impaired people to hear written messages from friends read by AI in their voices, give creators new tools to easily create and edit audio tracks for videos, and much more.”

In other news about Meta, the company announced an AI-powered music generator model earlier this week. Named MusicGen, it can generate music using text and melody.

“We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody. We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community,” tweeted Felix Kreuk, Research Engineer at Meta AI research.

MusicGen is said to be trained on 20,000 hours of music, which includes 10,000 high-quality licensed music tracks and 3,90,000 instrument-only tracks from Shutterstock and Pond5 stock media libraries.