Automatic Speech Recognition (Speech to Text)

For large files (above 200 seconds) you will need to use the asynchronous mode: see more in the documentation.
The API also returns word-level timestamps you can use for subtitling.
The API also accepts base-64 encoded files instead of URLs.

