Artificial intelligence has advanced at a blistering pace over the past few years, with few areas being as visibly transformed as AI image generation. When DALL-E 1 was first unveiled by OpenAI in January 2021, it felt like a revelation — an AI system that could create unique and often surreal images just from a single prompt. While primitive by today’s standards, DALL-E 1 opened the world’s eyes to the creative potential of generative AI.
Fast forward to 2024, and OpenAI has now released DALL-E 3, the latest evolution of its groundbreaking text-to-image model. The question is, how does it exactly compare to its previous iterations?
In this article, we’ll take a deep dive into how DALL-E has evolved from its first iteration to its current version. Stay tuned!
What is DALL-E?
DALL-E is an AI model created by OpenAI (the same company behind ChatGPT) that can generate images from text descriptions or prompts. It uses machine learning techniques to understand the semantics of your input and generate corresponding visuals. It’s currently in its third iteration, which we’ve already reviewed in-depth in this article.
DALL-E is a significant milestone in the AI space because it’s one of the first text-to-image models. It’s also one of the first to prioritize contextual understanding of prompts, text generation, and native integration with AI chatbots such as GPT-4.
How Has It Improved Over The Last Three Years?
To fully appreciate how DALL-E evolved over the years, we must first talk about the improvements it made in terms of features. Here’s a quick rundown of DALL-E’s new features, along with ones that were discontinued but we hope returns in the future:
- Creativity and Nuance: This has been a solid point of improvement across all DALL-E models. As OpenAI moves from one to the next, the one constant change is its creativity. We also tested DALL-E 3 against all the popular text-to-image AI models and we’re confident in saying that no-one can beat its nuance.
- Higher Resolution Images: DALL-E 2 can generate images at much higher resolutions, up to 1024 x 1024 pixels, compared to DALL-E’s 256 x 256 pixel limit. DALL-E 3 also allows you to have control over the image’s aspect ratio.
- Image Editing Capabilities: DALL-E 2 can not only generate images from scratch but also edit and modify (inpainting and outpainting) existing images based on text prompts. Unfortunately, this has been discontinued in DALL-E 3.
- Integration with ChatGPT: Since its third iteration, DALL-E can now be used natively with ChatGPT, allowing you to use conversations as context or even prompts.
- Text Generation: DALL-E 3 is among the first AI image generators that’s able to write text to a near-accurate level. GPT-4o only made this so much better and now DALL-E can write entire paragraphs with no issues.
DALL-E 1 vs. DALL-E 3
As much as we’d love to compare models using our own prompts, there’s no way to use the original DALL-E in 2024. So, we had to improvise.
Fortunately, we still have access to OpenAI’s original DALL-E page which features hundreds of image samples from the original model and its corresponding prompts. So, here’s a quick comparison between some of the images from the original DALL-E showcase against its equivalent using DALL-E 3:
Prompt: An illustration of an eggplant in a tutu walking a dog.
Prompt: A male mannequin dressed in an orange and black flannel shirt and black jeans.
Prompt: A macro photograph of a brain coral.
Prompt: An armchair in the shape of an avocado.
Prompt: A professional high-quality emoji of a lovestruck cup of boba.
Thoughts?
It’s not even a question of which is better — DALL-E 3 is obviously the better model. But we need to talk about what has changed to make it so.
Think of it this way: DALL-E paved the way forward. No-one had ever really heard of text-to-image generation before it was teased, so it’s clear why — despite how bad the images look now — it captured the attention of the entire world. The first try is always the roughest, but it’s a necessary step towards what we have now.
As you can see, images are more creative and understand context better. Not only is it apparent in the subject of the image, but also in the background. The level of detail, whimsical elements, and the unexpected combination of objects from DALL-E 3 showcase a highly imaginative and creative approach. DALL-E 3 also produces sharper images because of the improvements OpenAI made in resolution.
DALL-E 2 vs. DALL-E 3
Prompt: A photo of Michelangelo’s sculpture of David wearing headphones djing.
Prompt: An oil pastel drawing of an annoyed cat in a spaceship.
Prompt: A Shiba Inu dog wearing a beret and black turtleneck.
Prompt: Two futuristic towers with a skybridge covered in lush foliage, digital art.
Prompt: A hand-drawn sailboat circled by birds on the sea at sunrise.
Prompt: A van Gogh style painting of an American football player.
Prompt: A computer from the 90s in the style of vaporwave.
Thoughts?
The best way I can describe the difference between DALL-E 2 and DALL-E 3 is that the latter is more complete.
DALL-E 2’s outputs are a lot more coherent and solid than DALL-E 1, but it’s also still a lot more abstract than DALL-E 3. More than creativity, the third version creates more solid and structurally sound images that are more consistent with what we know in real life. In DALL-E 3, keyboards have more keys than letters in the alphabet, Van Gogh’s obsessions with spirals are more apparent, and there’s a clear separation between buildings and roads.
If you’re interested in learning more about their differences, we already compared DALL-E 2 and DALL-E 3 in-depth in this article.
The Bottom Line
We can’t fully understand how AI models improve without an understanding of its past. For DALL-E, it was a long road but OpenAI finally made a model that rivals Midjourney in creativity and is second-to-none in nuance.
If I were to describe these three models in one to two words, I’d describe the first version as a pioneer, the second as a stepping stone, and the third as the culmination. We don’t have any information yet if OpenAI plans to create a fourth version, but if there is, then it would have to be the pinnacle — its most advanced and refined iteration.
Interested in learning more about DALL-E? This article would be a good place to start. Have fun!