What is zero-shot learning, and how can it be effectively applied, in Natural Language Processing, to text classification, thanks to Hugging Face Transformers?
Thanks to the recent state-of-the-art transformer-based Natural Language Processing models, zero-shot learning has gained a lot of popularity in the Natural Language Processing world. The idea is that a model can now recognize some classes, even if it has not been trained for that.
This is what human beings naturally do. For example if your kid knows what a camel is, you just need to tell him that there is another animal called dromedary, very similar to a camel, except it has 1 hump on its back instead of 2! Next time your kid sees a picture of a dromedary, he will know what it is while it is the first time he sees one!
Zero-shot techniques associate observed and non observed classes through some form of so-called "auxiliary" information, that encodes distinguishing properties of objects. That has been a very popular technique in computer vision for long, that is now more and more used in Natural Language Processing.
Zero-shot learning works great for text classification. Text classification is about applying one or more categories to a piece of text (space, business, sport, etc.).
Until recently, text classification models could only categorize pieces of text with a predefined number of candidate categories. These categories had to be set in advance during training. This was painful because it meant that, every time you wanted to add a category, you had to re-train your model with more examples.
Since the creation of much bigger Natural Language Processing models (most of the time based on Transformers), it has been possible to train the models only on a specific list of categories, and then let users create new categories on the fly without having to re-train the model.
For example, let's say that your zero-shot text classification model was trained to recognize only 3 categories: space, nature, and sport. You can still use it to categorize texts for other categories, like for example business, food, or science.
This is a very powerful technique that allows a lot of flexibility while still giving great results.
There are excellent open-source Natural Language Processing models out there, based on Hugging Face Transformers, that work really well for zero-shot text classification.
At NLP Cloud we selected these 2 models that are, in our opinion, the best state-of-the-art models for zero-shot text classification for the moment:
Even if their accuracy is impressive and their latency is quite good, these 2 models still are computation intensive models, and latency can easily increase if the text you want to analyze gets too big or the number of candidate categories is too high. If accuracy is not your primary concern, and you would prefer a faster and less resource-intensive model, you could easily select another model. For example, distilled versions of Bart exist, called "DistilBart", and they are perfect for this.
Zero-shot learning, along with few-shot learning, are modern techniques that appeared with the creation of big Natural Language Processing models (see more about few-shot learning here). They give a lot of flexibility and make Natural Language Processing more and more impressive!
Feel free to give zero-shot classification a try and see if you like it too. You can easily try it on NLP Cloud!
François
Full-stack engineer at NLP Cloud