History of Language AI


In this NLP Cloud course we highlight the important milestones in the history of language AI (also known as Natural Language Processing).

Here is the structure of the course:


Hello, this is Julien Salinas from NLP Cloud, an advanced AI platform for your next AI project.

It is interesting for AI practitioners to understand the history of AI and see which important milestones led to the cutting-edge generative models we are all using today.

In this course, I will quickly go through the history of language models from the 20th century to today.

AI is not a new thing.

Engineers and linguists started working on artificial intelligence for text understanding around 1950.

This was the era of symbolic natural language processing until the 90s.

At the time, the main motivation was machine translation and AI was based on a set of rules.

Improving an AI algorithm was mainly about adding more rules to the program.

Researchers were quite enthusiastic about their first results and they thought that machine translation would be a problem solved in a couple of years.

Well, it is still not totally solved today.

These rules-based systems also led to the first, very simple, chatbot called Elisa.

As of 1990, we entered the era of statistical NLP.

Using AI with statistics instead of predefined rules meant that we could start building much more powerful systems without having to think about all the scenarios in advance.

This was made possible by mathematical research progress but also by the increased computation power provided by the new CPUs.

Systems would be learning based on human feedbacks, also known as supervised learning, and later even without any human intervention at all, also known as unsupervised learning.

So it was possible to train interesting models based on the huge volume of unstructured data coming from the internet.

New businesses were actually using machine learning in production at the time and the most popular use case was named entity recognition, also known as entity extraction.

Neural networks are not new.

In the middle of the 20th century, some researchers already had the intuition to create an AI system made of neurons that would imitate the human brain.

But neural networks only started to give interesting results around 2010.

Thanks to GPUs, it was then possible to train much bigger neural networks.

This was the beginning of the so-called deep learning era.

The first impressive results came from computer vision thanks to convolutional neural networks, which allowed for advanced image classification.

Language really benefited from deep learning only a bit later.

Until 2010 and the rise of deep learning, language AI was essentially a research area and few businesses used natural language processing in their products.

Now let's see which recent breakthrough led to the generative AI technology we all know today.

The real breakthrough for language models was in 2017, when some Google researchers released a paper called Attention is All You Need.

This paper described a new kind of neural network architecture called the transformer based on a new principle called self-attention.

The transformer architecture is at the heart of all the impressive language models we have seen since 2017.

Very quickly after that, the first model was trained by Google following the transformer architecture.

This model was called BERT.

BERT was the first production-grade language model that could be used for all sorts of use cases, summarization, entity extraction, question answering, translation, and more.

BERT was really interesting because for the first time, the model was created that was good at transfer learning.

Basically, the model was pre-trained on a large set of unannotated data and it was then able to learn quickly many sorts of use cases thanks to quick fine-tunings requiring very little additional data.

OpenAI was initially a non-profit AI startup that released a new kind of architecture, GPT, based on the transformer.

When they released GPT-2 in 2019, everyone was impressed by the capabilities of this text generation model.

GPT-2 was the first production-grade generative model.

It was especially good for text completion.

For example, it was used by Microsoft for auto-completion in Microsoft Office.

It was trained on 8 million web pages and 7,000 books and contained 1.5 billion parameters, which is of course not much compared to the models we have today.

In 2020, OpenAI made a second revolution.

They became a for-profit company and released a powerful generative model called GPT-3.

GPT-3 was still based on the GPT architecture but trained on more content.

It contained 175 billion parameters and required thousands of GPUs to train for several months.

Even if not official, researchers think that pre-training GPT-3 cost around 5 million dollars.

It was the first versatile generative model that was able to address all sorts of use cases.

In order to make the most of this model, fine-tuning was not even needed anymore.

Most of the time, few-shot learning was enough and it actually even worked very well in zero-shot learning mode.

Then, in the same spirit, came ChatGPT and GPT-4.

Soon after that, OpenAI released other kinds of disruptive models.

Thanks to DALI, it was possible to generate beautiful images out of text.

And they dramatically raised the bar in the speech-to-text industry thanks to Whisper.

You might have noticed many different terms in this course.

Machine learning, deep learning, neural networks, natural language processing, AI, generative AI.

Some are specific technical terms, while others are simply trendy buzzwords.

I personally think that natural language processing is the right term for the language AI technology we are using today.

But this is not very important.

You now have a basic understanding of where our AI models are coming from.