Struggling with AI or full-stack development? Our experts are here to guide you: tailored advice, technical integration, and more. Reach out at [email protected].

Effectively Using Text To Image With Stable Diffusion, The DALL-E 2 / MidJourney Alternative

September 2, 2022

It is possible to leverage AI in order to generate images out of text (also known as text to image). Stable Diffusion, by Stability AI, is the best open-source AI model for image generation, and a great alternative to DALL-E 2 or MidJourney. But properly using this model takes some practice, so let's show you how to generate amazing images with Stable Diffusion!

DALL-E 2 And MidJourney

DALL-E 2, released by OpenAI, is a powerful AI model for text to image. But it is still in closed beta as of this writing, meaning that you need to ask for a special access to use it.

With DALL-E 2 you can either generate realistic images that look like a real photograph, or generate more abstract images that can look like drawings, paintings, or computer generated images.

MidJourney is also a great great candidate for text to image and it is especially popular for AI art generation.

How do you generate such images? Simply by creating a text instruction in natural language. Here are a couple of examples:

Concept art of a futuristic city during sunset.

Concept art of a futuristic city during sunset, generated by Stable Diffusion

Photograph of a gorilla in the street.

Photograph of a gorilla in the street, generated by Stable Diffusion

Stable Diffusion

Stable Diffusion is an open-source text to image model, created by a company of researchers called Stability AI (see their website here).

Stable Diffusion is the first open-source AI model reaching the same performance as DALL-E 2 and MidJourney. It is returning accurate results while keeping the response time quite low.

Stable Diffusion is now available on NLP Cloud! But making the most of this great AI model takes some practice and you might be disappointed by your first results.

This is why we thought it would be interesting to give you more details about how to use these text to image models.

The Naive Approach

At first sight, you might want to use very simple instructions like "a car", or "a lion". This would not necessarily return amazing results. Here are some examples:

A car

A car, generated by Stable Diffusion

A lion

A lion, generated by Stable Diffusion

This is not bad but we can do much better!

Choose A Technique

The easiest and most impressive improvement you can make is to select a creation technique for your image. For example it could be oil painting, pencil drawing, concept art, photograph... Let's try some examples:

A pencil drawing of a lion

A pencil drawing of a lion, generated by Stable Diffusion

An oil painting of a lake in Winter

An oil painting of a lake in Winter, generated by Stable Diffusion

A concept art of a cyberpunk car

A concept art of a cyberpunk car, generated by Stable Diffusion

Impressive to see how easy it is to generate some art following a specific style in no time, isn't it?

Choose A Style

Sometimes a technique is not enough to describe the kind of image you would like to generate. In that case, specifying an artist can help! Here are some examples:

A tulip field made by Claude Monet

A tulip field made by Claude Monet, generated by Stable Diffusion

An oil painting of a woman made by Rembrandt

An oil painting of a woman made by Rembrandt, generated by Stable Diffusion

It is a good opportunity to do some research on artists you don't know yet.

Explore

The above examples may be very useful, but you can still do better by using some specific keywords. The Stability AI team recommends that you try some of the following keywords in your instructions:

Highly detailed, surrealism, trending on art station, triadic color scheme, smooth, sharp focus, matte, elegant, the most beautiful image ever seen, illustration, digital paint, dark, gloomy, octane render, 8k, 4k, washed colors, sharp, dramatic lighting, beautiful, post processing, picture of the day, ambient lighting, epic composition.

No doubt that you will discover special instructions that nobody never tried before you that create amazing results!

Also, feel free to create longer instructions. You don't necessarily have to stick to one sentence. You can use a whole paragraph instead for example.

If you need ideas, here are some interesting examples:

highly detailed futuristic Apple iGlass computer glasses on face of human, cyberpunk, hand tracking, concept art, character art, studio lightning, bright colors, intricate, masterpiece, photorealistic, hyperrealistic, sharp focus, high contrast, Artstation HQ, DeviantArt trending, 8k UHD, Unreal Engine 5

A detailed manga illustration character full body portrait of a dark haired cyborg anime man who has a red mechanical eye, trending on artstation, digital art, 4 k resolution, detailed, high quality, sharp focus, hq artwork, insane detail, concept art, character concept, character illustration, full body illustration, cinematic, dramatic lighting

a cyberpunk zulu warrior sitting on a cliff watching a meteor fall to earth from a distance, by alena aenami and android jones and greg rutkowski, Trending on artstation, hyperrealism, elegant, stylized, highly detailed digital art, 8k resolution, hd, global illumination, ray tracing, radiant light, volumetric lighting, detailed and intricate cyberpunk ghetto environment, rendered in octane, oil on canvas, wide angle, dynamic portrait

Machine god rebuilding itself, fantasy, d & d, intricate, detailed, whimsical, detailed, trending on artstation, trending on artstation, smooth

Old wise Monk guiding a Lost Soul through Limbo, in the Style of Tomer Hanuka and Atey Ghailan, vibrant colors, trending on artstation

paul bettany as angel with wings is covered in vines and flowers and moss and standing in front of a beautiful cottage, a digital painting by thomas canty and thomas kincade and ross tran, art nouveau, atmospheric lighting, trending on artstation

concept art for a car huge sharp spikes, painted by syd mead, high quality

Anxious good looking pale young Indian doctors wearing American clothes outside a hospital, portrait, elegant, intricate, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha

skull god, close - up portrait, powerfull, intricate, elegant, volumetric lighting, scenery, digital painting, highly detailed, artstation, sharp focus, illustration, concept art, ruan jia, steve mccurry

ukrainian girl with blue and yellow clothes near big ruined plane, concept art, trending on artstation, highly detailed, intricate, sharp focus, digital art, 8 k

terrifying unholy crying ghost, very detailed face, detailed features, fantasy, circuitry, explosion, dramatic, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by Gustave Dore, octane render

Beautiful and playful lady liberty portrait, art nouveau, fantasy, holding a vase by Rene Lalique , elegant, highly detailed, sharp focus, art by Artgerm and Greg Rutkowski and WLOP

a portrait of a woman that is a representation of argentinian culture, buenos aires, fantasy, intricate, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha

Painting by Greg Rutkowski, at night a big ceramic jug with gold ornaments flies high in the night dark blue sky above a small white house under a thatched roof, stars in the sky, rich picturesque colors

pizza party at a theme park, light dust, magnificent, close up, details, sharp focus, elegant, highly detailed, illustration, by Jordan Grimmer and greg rutkowski and PiNe(パイネ) and 薯子Imoko and 香川悠作 and wlop and maya takamura, intricate, beautiful, Trending artstation, pixiv, digital Art

Studio photograph of hyperrealistic accurate portrait sculpture of timothy dalton, beautiful symmetrical!! face accurate face detailed face realistic proportions, made of pink frosted glass on a pedestal by ron mueck and matthew barney and greg rutkowski, hyperrealism cinematic lighting shocking detail 8 k

Conclusion

As you can see, image generation is a very impressive technique that has been democratized by models like DALL-E 2 or MidJourney and Stable Diffusion.

Once you master the text to image techniques, you can easily generate tons of amazing images in the blink of an eye.

Hope you found it useful! If you have some questions about how to make the most Stable Diffusion, please don't hesitate to ask us.

François
Full-stack engineer at NLP Cloud