Speech Synthesis (Text-To-Speech)

What Is Speech Synthesis?

Speech synthesis (also known as text-to-speech or voice synthesis) is about turning a piece of text into audio. Let's see how to perform speech synthesis with Microsoft Speech T5 on NLP Cloud.

Simply send a piece of text and let the model generate the corresponding audio out of it (in English only).

Here is an example. Let's generate an audio from the following text:

This report summarizes a discussion between John and his doctor.

Here is the result:

You can also choose the type of voice you are using.

Why Use Text-To-Speech?

Text-to-speech is used in more and more applications as the last part of an AI pipeline. Many applications can be considered. Here are 2 examples:

Virtual Assistant

When used together with speech to text (see the OpenAI Whisper model for example) and generative models, it is possible to build fully fledged virtual assistants that understand human voice, and respond to it.


Being able to read text out loud is very useful for persons who cannot properly read.

Speech Synthesis API

Building an inference API for speech synthesis is a necessary step as soon a you want to use speech synthesis in production. But building such an API is hard... First because you need to code the API (easy part) but also because you need to build a highly available, fast, and scalable infrastructure to serve your models behind the hood (hardest part). It is especially hard for machine learning models as they consume a lot of resources (memory, disk space, CPU, GPU...).

Such an API is interesting because it is completely decoupled from the rest of your stack (microservice architecture), so you can easily scale it independently, and you can access it using any programming language. Most machine learning frameworks are developed in Python, but it's likely that you want to access them from other languages like Javascript, Go, Ruby...

NLP Cloud's Speech Synthesis API

NLP Cloud proposes a text to speech API based on Microsoft Speech T5 that gives you the opportunity to perform blazing fast speech generation out of the box.

For more details, see our documentation about speech synthesis here. And easily test speech synthesis on our playground..