GPT-3 and GPT-J give amazing results. They are the best AI models for chatbots as of this writing. Here are a couple of tips in order to help you build the best chatbot with GPT-3 or GPT-J. We also make a quick comparison with Blenderbot 2.
In May 2020, OpenAI released GPT-3 (see GPT-3's website). This is the biggest Natural Language Processing model ever created, trained on 175 billion parameters (for the Davinci version)! This AI model is not open-source. It is now owned by Microsoft.
GPT-J was released in June 2021 as an open-source model, by EleutherAI and is very similar to GPT-3 Curie (6 billion parameters) (see EleutherAI's website here).
Both give incredible results in terms of text generation, and especially for chatbots and conversational AI. But it takes a bit of practice! First because these models need to be given a couple of examples before behaving as expected. And also because they are "stateless", meaning that they don't keep an history of your conversations. So you need to find a way around that!
If you naively send requests to these models without a bit of context and formatting, you will be disappointed by the responses. This is because these models are very versatile. They can do chatbots but many other things like question answering, summarization, paraphrase, classification, entity extraction, product description generation, and more. So the first thing you need to do is tell the model which "mode" he should adopt.
Here is a request example you could send:
This is a discussion between a [human] and a [robot]. The [robot] is very nice and empathetic. [human]: I broke up with my girlfriend... [robot]:
In this example, you can note 2 things.
First, we added a simple formatting in order for the model to understand that it is in conversational mode ([human], [robot], ...).
Secondly, we added some context at the top in order to help the model understand what it is doing and the tone it should use. Don't overestimate the importance of this context though. Most of the AI's attitude and tone are going to come from the discussion itself. So if you really want your AI to sound "nice and empathetic" you should give some examples in your input that show a nice and empathetic character. Here is an example (if you want your model to be rude and sarcastic, simply pass a couple of rude and sarcastic examples instead):
This is a discussion between a [human] and a [robot]. The [robot] is very nice and empathetic. [human]: Hello nice to meet you. [robot]: Nice to meet you too. ### [human]: How is it going today? [robot]: Not so bad, thank you! How about you? ### [human]: I am ok, but I am a bit sad... [robot]: Oh? Why that? ### [human]: I broke up with my girlfriend... [robot]:
Here you can see that we first start with a couple of examples before passing the actual human message. 3 examples might be enough, but not always, so we encourage you to add more examples, especially in case of complex situations. Sometimes, even passing tons of examples is not enough. In that case, you have no choice but to fine-tune GPT-3 or GPT-J in order to get good results.
GPT-3 and GPT-J are "stateless" models, meaning that every request you make is new and the AI is not going to remember anything about the previous requests you made.
In many Natural Language Processing situations it's not a problem (summarization, classification, paraphrase...), but as far as chatbots are concerned it's definitely an issue because we do want our chatbot to memorize the discussion history in order to make more relevant responses.
For example, if you tell the AI that you're a programmer, you want it to keep it in memory because it will have an impact on the following responses it will make.
The best way to achieve this is to store every AI response in a local database. For example, the PostgreSQL database supports long texts storing, thanks to the "text" type, with a very good efficiency.
Then, everytime you're making a new request to the chatbot, you should do the following:
This is both a versatile and robust system that requires little effort, and perfectly leverages the power of GPT-3 and GPT-J.
Quick note: these models cannot address requests bigger than 2048 tokens (i.e. more or less 1700 words), so if your conversation history goes above this, you might want to either truncate the oldest part of the history, or only keep the most important parts of the discussions.
Facebook recently released a chatbot called Blenderbot 2. They are trying to address the conversation history issue by storing some data on the server side in order to keep track of the discussion history. It sounds appealing, but in practice it creates lots of new challenges. For example how do you make sure that each user has its own history and not all histories are mixed together? How do you control the data retention policy? Is the data persistent in case of a server restart, and how do you back up the data in case of a server crash? Etc.
We tried Blenderbot 2 and we are not convinced that it's a better solution than GPT-3/GPT-J based chatbots. Not only can't you easily set a context for your chatbot (like we did above), but also the server data storage is creating more challenges than simply maintaining the history on the client side.
GPT-3 and GPT-J really took chatbots and conversational AI to the next level. These huge models are very good at understanding your context and adapting to it. In most cases, setting the right context is enough, but for advanced use cases the best solution is to train/fine-tune your own AI model. Good news is that you can achieve great results with few examples (a 500 example dataset is a very good start).
On NLP Cloud you can easily try GPT-J. You can also fine-tune it and deploy your own private GPT-J model in one click. If not done yet, feel free to have a try.
If you have questions about GPT-3/GPT-J or chatbots in general, please don't hesitate to contact us!François