OpenAI unveiled the latest generation of their text-to-image model, DALL-E 3, on Wednesday. This model has been built natively in company’s chatbot, ChatGPT, and will become available for ChatGPT Plus and Enterprise customers in the coming weeks.

Currently, the image generator is in a research preview phase, limiting access to a very small number of users. This implies that we will have to wait until the launch for ChatGPT Plus and Enterprise customers to see it undergo extensive testing by users.

To provide a glimpse of DALL-E 3’s capabilities, OpenAI has published some of the results along with the prompts they used. Nick St. Pierre, a user on X (formerly Twitter), who frequently shares insights and learnings on AI, used identical prompts in Midjourney 5 (apparently version 5.2) and shared the results to provide a comparison between the two AI image generators.

He explained that, given Midjourney’s current lack of support for text or typography, he eliminated all text call-outs from the prompts when he used them in Midjourney.

Here are all the prompts and results.

1) “A modern architectural building with large glass windows, situated on a cliff overlooking a serene ocean at sunset.”

"A modern architectural building with large glass windows, situated on a cliff overlooking a serene ocean at sunset."

2) “A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon.”

"A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon."

3) “Photo of a lychee-inspired spherical chair, with a bumpy white exterior and plush interior, set against a tropical wallpaper.”

"Photo of a lychee-inspired spherical chair, with a bumpy white exterior and plush interior, set against a tropical wallpaper."

4) “In front of a deep black backdrop, a figure of middle years, her Tongan skin rich and glowing, is captured mid-twirl, her curly hair flowing like a storm behind her. Her attire resembles a whirlwind of marble and porcelain fragments. Illuminated by the gleam of scattered porcelain shards, creating a dreamlike atmosphere, the dancer manages to appear fragmented, yet maintains a harmonious and fluid form.”

"In front of a deep black backdrop, a figure of middle years, her Tongan skin rich and glowing, is captured mid-twirl, her curly hair flowing like a storm behind her. Her attire resembles a whirlwind of marble and porcelain fragments. Illuminated by the gleam of scattered porcelain shards, creating a dreamlike atmosphere, the dancer manages to appear fragmented, yet maintains a harmonious and fluid form."

5) “Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated.”

"Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated."

6) “A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.”

"A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form."

7) “Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated.”

"Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated."

Looking at these latest comparison results between DALL-E 3 and Midjourney 5.2, it is difficult to definitively say if one generator is significantly better than the other. While DALL-E 3 may be producing higher quality images in some cases, the difference does not appear to be massive based on this limited sample.

YouTuber MattVidPro, who previously received some (reportedly) DALL-E 3 images from an AI community member, had commented at that time that the quality was unparalleled. In his analysis video, MattVidPro stated he had not seen image generations as good as what DALL-E 3 was producing and that “Midjourney could not compete at that level, and I don’t think even a Midjourney version 6 would be able to compete at that level.”

However, judging from these newest head-to-head results, Midjourney 5.2 appears more competitive with DALL-E 3 than MattVidPro’s previous comments suggested. We should reserve ultimate judgment on which generator performs better until DALL-E 3 has an official full launch and more extensive comparison testing can be done.

People who responded to Nick’s thread where he compared the results of DALL-E 3 and Midjourney 5 also seemingly don’t have a consensus. Some prefer Midjourney while others DALL-E 3. There’s a poll going on under the thread which at the moment has DALL-E 3 in the lead. It would be interesting to see what these results look like when peoeple have tested DALL-E 3 themselves.

The responses to Nick’s Twitter thread comparing DALL-E 3 and Midjourney 5.2 results do not seem to show a clear consensus either. Some commenters expressed a preference for Midjourney’s outputs, while others felt DALL-E 3 produced superior images. There is an ongoing poll associated with the thread currently showing DALL-E 3 in the lead but the results remain early.

Getting meaningful feedback from a wider audience will likely require people to have hands-on access and do extensive testing of DALL-E 3 for themselves, once it launches publicly.

The differing opinions highlighted in the responses to Nick’s samples indicate reasonable people can disagree on which model performs best based on limited examples. A more definitive assessment of whether one generator outperforms the other may have to wait until DALL-E 3 is fully available and more users have meaningful experience generating and comparing images from both tools.