What Is Automatic Speech Recognition (Speech To Text)?
Automatic speech recognition (also known as speech to text) is about extracting text from an audio file. This is often a critical first step in an AI pipeline. Great progress have been made these last few years, and it is now possible to extract text from an audio or video file with a great accuracy.
For example, here is a chapter from a LibriVox audio book (The Metal Giants, by Edmond Hamilton), stored on Archive.org: https://ia801400.us.archive.org/10/items/metalgiants_2209_librivox/metalgiants_03_hamilton_64kb.mp3.
Once we perform automatic speech recognition on this file, we get the following text:
Chapter three of The Medal Giants by Edmound Hamilton. This Librivox recording is in the public domain. Read by Ben Tucker. Chapter three: Lanier arrived in Stockton early the next morning. His face was drawn and haggard, as it had been since he first read a certain humorous newspaper despatch, and in his mind was an immense perplexity, a vague, chilling fear. Until late in the afternoon, he tramped warily through the town, asking in all quarters the same question: do you know of anyone named Det Mold who lives in or around Stockton? A tall, strong man? And from all he questioned, he got no trace until he happened into the office of a small trucking and hauling company. None of them knew anything of Deadmold, but they had done some work for a certain Foster who corresponded exactly to Lanier's description. This man lived several miles from the city in a northeastern direction and had hired them to haul some boxes from the railroad to his home, an old farmhouse. A mighty bad road it was too, and this Foster had been very particular about the moving of his stuff. Yes, they could direct him to the place. He went out such and such a concrete road and turned up a ruddy lane, very steep. By the time the sun hung poised above the western horizon, Lanier was already ascending that steep, twisted road. More than once he glanced back at the city below. A city bathed in the golden afternoon sunlight. Its streets were filled now with workers returning home from the mills, tired and blackened, calling out to the friends they met for the latest news on that \"Morgan criter\" as they termed it. A quiet serenity, a dreamy contented peace, pervaded Stockton, contrasting with the tense excitement of the preceding night. In a thousand homes the evening meal was being prepared, and the day's gossip related. In the west, the sun sank lower and lower, and all around, beyond the encircling hills, death marched toward the city with crashing giant strides. End of chapter three.
This is a great text extraction, not only because there is no spelling mistake, but also because punctuation was automatically added.
If the speaker speaks too fast, or if some unknown vocabulary is employed, it sometimes results in unexpected errors though. But good news is that AI is making a lot of progress, so that such errors happen less and less.