Effectively Using Text To Image With Stable Diffusion, The DALL-E 2 / MidJourney Alternative

It is possible to leverage AI in order to generate images out of text (also known as text to image). Stable Diffusion, by Stability AI, is the best open-source AI model for image generation, and a great alternative to DALL-E 2 or MidJourney. But properly using this model takes some practice, so let's show you how to generate amazing images with Stable Diffusion!

DALL-E 2 And MidJourney

DALL-E 2, released by OpenAI, is a powerful AI model for text to image. But it is still in closed beta as of this writing, meaning that you need to ask for a special access to use it.

With DALL-E 2 you can either generate realistic images that look like a real photograph, or generate more abstract images that can look like drawings, paintings, or computer generated images.

MidJourney is also a great great candidate for text to image and it is especially popular for AI art generation.

How do you generate such images? Simply by creating a text instruction in natural language. Here are a couple of examples:

Concept art of a futuristic city during sunset.

Concept art of a futuristic city during sunset, generated by Stable Diffusion

Photograph of a gorilla in the street.

Photograph of a gorilla in the street, generated by Stable Diffusion

Stable Diffusion

Stable Diffusion is an open-source text to image model, created by a company of researchers called Stability AI (see their website here).

Stable Diffusion is the first open-source AI model reaching the same performance as DALL-E 2 and MidJourney. It is returning accurate results while keeping the response time quite low.

Stable Diffusion is now available on NLP Cloud! But making the most of this great AI model takes some practice and you might be disappointed by your first results.

This is why we thought it would be interesting to give you more details about how to use these text to image models.

The Naive Approach

At first sight, you might want to use very simple instructions like "a car", or "a lion". This would not necessarily return amazing results. Here are some examples:

A car

A car, generated by Stable Diffusion

A lion

A lion, generated by Stable Diffusion

This is not bad but we can do much better!

Choose A Technique

The easiest and most impressive improvement you can make is to select a creation technique for your image. For example it could be oil painting, pencil drawing, concept art, photograph... Let's try some examples:

A pencil drawing of a lion

A pencil drawing of a lion, generated by Stable Diffusion

An oil painting of a lake in Winter

An oil painting of a lake in Winter, generated by Stable Diffusion

A concept art of a cyberpunk car

A concept art of a cyberpunk car, generated by Stable Diffusion

Impressive to see how easy it is to generate some art following a specific style in no time, isn't it?

Choose A Style

Sometimes a technique is not enough to describe the kind of image you would like to generate. In that case, specifying an artist can help! Here are some examples:

A tulip field made by Claude Monet

A tulip field made by Claude Monet, generated by Stable Diffusion

An oil painting of a woman made by Rembrandt

An oil painting of a woman made by Rembrandt, generated by Stable Diffusion

It is a good opportunity to do some research on artists you don't know yet.


The above examples may be very useful, but you can still do better by using some specific keywords. The Stability AI team recommends that you try some of the following keywords in your instructions:

Highly detailed, surrealism, trending on art station, triadic color scheme, smooth, sharp focus, matte, elegant, the most beautiful image ever seen, illustration, digital paint, dark, gloomy, octane render, 8k, 4k, washed colors, sharp, dramatic lighting, beautiful, post processing, picture of the day, ambient lighting, epic composition.

No doubt that you will discover special instructions that nobody never tried before you that create amazing results!

Also, feel free to create longer instructions. You don't necessarily have to stick to one sentence. You can use a whole paragraph instead for example.


As you can see, image generation is a very impressive technique that has been democratized by models like DALL-E 2 or MidJourney and Stable Diffusion.

Once you master the text to image techniques, you can easily generate tons of amazing images in the blink of an eye.

Hope you found it useful! If you have some questions about how to make the most Stable Diffusion, please don't hesitate to ask us.

Julien Salinas
CTO at NLP Cloud