Speech Synthesis
Speech Synthesis
What Is Speech Synthesis?
Speech synthesis (also known as text-to-speech or voice synthesis) is about turning a piece of text into audio. Let's see how to perform speech synthesis with Microsoft Speech T5 on NLP Cloud.
Simply send a piece of text and let the model generate the corresponding audio out of it (in English only).
Why Use Speech Synthesis?
Text-to-speech is used in more and more applications as the last part of an AI pipeline. Many applications can be considered. Here are 2 examples:
Virtual Assistant
When used together with speech to text (see the OpenAI Whisper model for example) and generative models, it is possible to build fully fledged virtual assistants that understand human voice, and respond to it.
Accessibility
Being able to read text out loud is very useful for persons who cannot properly read.
Use GPU
Control whether you want to use the model on a GPU. Machine learning models run much faster on GPUs.
Language
AI models don't always work well with non-English languages.
We do our best to add non-English models when it's possible. See for example Fine-tuned LLaMA 3.1 405B, LLaMA 3 70B, Dolphin, ChatDolphin, XLM Roberta Large XNLI, Paraphrase Multilingual Mpnet Base V2, or spaCy. Unfortunately not all the models are good at handling non-English languages.
In order to solve this challenge, we developed a multilingual module that automatically translates your input into English, performs the actual NLP operation, and then translates the result back to your original language. It makes your requests a bit slower but often returns very good results.
Even for models that natively understand non-English languages, they actually sometimes work even better with the multilingual addon.
Simply select your language in the list, and from now on you can write the input text in your own language!
This multilingual add-on is a free feature.
Voice Type
Determines the type of voice to use. Possible values are "woman" and "man". Defaults to "woman".