Text Generation API with GPT-Neo and GPT-J (GPT-3 alternatives)

What is Text Generation?

Text generation is about starting a text, and then letting the Natural Language Processing model create the rest of the text for you, in the spirit of your initial input. It is up to you to decide how large you want the generated text to be.

Let's say you have the following piece of text:

GPT Neo is a powerful Natural Language Processing model

Now, let's say you want to generate about 250 words from the above text. Just give your text to the model and it will generate the rest:

GPT Neo is a powerful Natural Language Processing model that does what every model should do: predict your future behavior. It's very effective at detecting patterns in your real-time input, predicting upcoming events and responses that you may not even consider. GPT's power consists of a set of internal neural processes that are extremely accurate. But at the same time that results are good, they need to be fast. So GPT Neo is designed to run on the server. To speed up your results, you may need to use the cloud. Here's why: you may be storing your business data in the cloud. And using the cloud may mean that you can't get your data into the Natural Language Processing process because you have to transfer everything over the cloud.

Pretty smart isn't it? As you can see, the text generated by the model says interesting things about GPT Neo. Most text generation models are not deterministic, meaning that, every time you send the same text to the model, you will get different results.

You can achieve almost any Natural Language Processing use case thanks to text generation, as long as you are leveraging a huge and versatile model: sentiment analysis, grammar and spelling correction, question answering, code generation, machine translation, intent classification, paraphrasing... and more!

Why Use Text Generation?

Text generation is a great way to automatically create content. Here are a couple of examples.

Marketing Content Generation

Content creation is crucial for SEO today, but it's also a tedious job. Why not leave it to a dedicated Natural Language Processing model, and then focus on something more important?


An interesting way of making chatbots sound more human is to add non-essential "chit-chats" to the core discussion. Text generation can help in this situation.

Fuzzy testing

Fuzzy testing is a technique used by programmers in order to test their applications with random content. Generating new content for every test is a convenient way to perform fuzzy testing.

Mock-up Creation

Before releasing a new application, it is often necessary to create mock-ups in order to get user feedbacks. Filling the blanks of these mock-ups with generated text is a good way to make it look as real as possible.

Text Generation with GPT-Neo and GPT-J, the open-source versions of GPT-3

Hugging Face transformers is an amazing library that has been recently released. It is based on either PyTorch or TensorFlow, depending on the model you're using. Transformers have clearly helped deep learning Natural Language Processing make great progress in terms of accuracy. However this accuracy improvement comes at a cost: transformers are extremely demanding in terms of resources.

Hugging Face is a central repository regrouping all the newest open-source Natural Language Processing transformer-based models. 2 of them, Eleuther AI's GPT-NeoX 20B and GPT-J are perfectly suited for text generation in many languages. They are open-source equivalents of the impressive GPT-3 model from OpenAI.

Text Generation Inference API

Building an inference API for text generation is a necessary step as soon a you want to use text generation in production. But keep in mind that building such an API is not necessarily easy. First because you need to code the API (easy part) but also because you need to build a highly available, fast, and scalable infrastructure to serve your models behind the hood (hardest part). Machine learning models consume a lot of resources (memory, disk space, CPU, GPU...) which makes it hard to achieve high-availability and low latency at the same time.

Leveraging such an API is very interesting because it is completely decoupled from the rest of your stack (microservice architecture), so you can easily scale it independently and ensure high-availability of your models through redundancy. But an API is also the way to go in terms of language interoperability. Most machine learning frameworks are developed in Python, but it's likely that you want to access them from other languages like Javascript, Go, Ruby... In such situation, an API is a great solution.

NLP Cloud's Text Generation API

NLP Cloud proposes a text generation API that gives you the opportunity to perform text generation out of the box, based on Hugging Face transformers' Eleuther AI's GPT-NeoX 20B and GPT-J models, with good accuracy. Due to the extremely complex computations needed for such a task, the response time (latency) is high though, so we do recommend a GPU plan for these models. You can either use the pre-trained models, upload your own GPT-J and GPT-Neo custom models, or fine-tune GPT-J on the platform so the model is perfectly tailored to your use case

For more details, see our documentation about text generation with GPT-J here.

Testing text generation locally is one thing, but using it reliably in production is another thing. With NLP Cloud you can just do both!