Language Detection API

What is Language Detection?

Language detection is about automatically understanding in which language a text was written. It is also possible to detect several languages if a piece of text contains several languages.

Let's say you have the following block of text:

NLP Cloud is an easy way to leverage Natural Language Processing in production. The API has been released early January 2021. Cette API est à la fois peu onéreuse et très robuste.

As you can see, this text contains 2 languages: English and French. Around 2/3 of the text is in English, and 1/3 is in French.

If we perform language detection on this text, we will get 2 languages, and the proportion of the text in each language. Something like that: english: 0.66 and french: 0.33.

Language detection

Why Use Language Detection?

Language detection is useful in many scenarios. Let's give you a couple of examples.

Multilingual Support

Companies who can afford it perform support in multiple languages. In order to triage the incoming messages to the right support agent, it is necessary to automatically detect the language of the message first.

Machine Translation

Language detection is often a first step in machine translation: in general you first need to detect the language, and then translate it with the right translation model.

First Step in a Natural Language Processing Workflow

It is often interesting to perform a language detection as a first step, in order to know which model to use later. For example, let's say that you have entity extraction (NER) models in several languages. Before choosing one of them, you need to know what is the language of your text.

Language Detection with Python LangDetect.

LangDetect is the most popular Python library dedicated to language detection. If it both fast and accurate. It can easily detect several languages in the same text.

Language Detection API

Building an API for language detection is often a necessary step as soon a you want to use language detection in production. But keep in mind that building such an API is not necessarily easy. First because you need to code the API (easy part) but also because you need to build a highly available, fast, and scalable infrastructure to serve your language detection library behind the hood (hardest part).

Leveraging such an API is very interesting because it is completely decoupled from the rest of your stack (microservice architecture), so you can easily scale it independently and ensure high-availability of your language detection module through redundancy. But an API is also the way to go in terms of language interoperability. LangDetect is a Python library, but it's likely that you want to access it from other languages like Javascript, Go, Ruby... In such situation, an API is a great solution.

NLP Cloud's Language Detection API

NLP Cloud proposes a language detection API that gives you the opportunity to perform language detection out of the box, based on Python LangDetect, with excellent performances.

For more details, have a look at the documentation about language detection here.

Testing language detection locally is one thing, but using it reliably in production is another thing. With NLP Cloud you can just do both!