OpenAI has been secretly working on a new version of its imaging model, a tech YouTuber MattVidPro has revealed. The new model, which is possibly Dall-E 3, is invite-only and is being tested through a preview inside ChatGPT 4 with extremely limited access. He stated that only 400 people have access to the latest version, and one of his Discord community members is among them, who ran some prompts given by Matt and shared the results. The exact public launch date remains unclear but the AI lab took six months from announcement to making Dall-E 2 available publicly, so there’s a good chance that we could see the latest version launching before the end of the year.
In the video, Matt shared the images generated with those prompts and offered his feedback, calling the secret product the best AI image generator he has seen yet, “Today we are truly glimpsing into the future of AI image generation. I promise you you have not seen any AI image generations that are this good. Midourney cannot compete at this level. I don’t think even Midjourney version 6 would be able to compete at this level.”
The last version of Dall-E, Dall-E 2 was announced in April 2022 (made available publicly without a waitlist in September 2022) and was the best AI image generator of the time but was shortly overtake by Stable Diffusion with modifications, Midjourney, and Bing Image Creator, argued Matt in the video.
Sharing results of some prompts for generating text-based images, he stated that most current AI image generation models cannot come close to spelling text this coherently, “Honestly this thing is pretty shockingly good. You’ll never get Midjourney to produce text this coherently. This is far beyond any capabilities we see from Midjoruney or Stable Diffusion or any of the rest. This is very very impressive.”
The new iteration of the unreleased image generation model by OpenAI can do images in multiple aspect ratios. The YouTuber said that the users who shared these results told him he has zero interest in using Midjourney after using the new OpenAI’s secret image generator.
He also shared a comparison prompt that he had shared to generate images from the May and July versions of supposedly Dall-E 3 and Midjourney. The prompt, “Painting of a pink jester giving a high five to a panda while in a cycling competition. The bikes are made of cheese and the ground is very muddy. They are driving in a foggy forest. The panda is angry,” produced the following results.
Explaining the results of the May version of OpenAI’s image generator, he stated, “As you can see, it captures pretty much everything. They’re kind of giving a high-five although their fingers are warped. The panda is definitely not angry. The bikes are yellow but they don’t look very cheesy. The ground is a little bit muddy. They’re definitely in a foggy forest. It is definitely a painting and we could see another cycler in the background which kind of assumes that this is a competition of some sort so this may version of Dall-E that produced this generation did quite a good job overall – better than any model that we see today.”
Matt wasn’t impressed with the result same prompt produced in Midjourney 5.2, “[It] can’t even get close to including the same amount of specific details that are asked from the prompt. I don’t see a single bicycle. They are all motorcycles. None of them are made of cheese. The ground is not muddy. I guess they could be considered to be in a competition. I don’t see a single gesture either on this. They are in a foggy forest so I’ll give it that but you can see its just missing way more of the prompt details here. Its ridiculous. Midjourney isn’t even coming close.”
The July version of Dall-E 3 got almost everything right, “There’s a little bit more detail here. It is still a painting which is good. The bikes are way more made of cheese this time. That’s a perfect on the bike I would say. The bear definitely looks angry. The jester is still pink. They are hundred percent high fiving. Look how coherent that hand is and the ground is super muddy. It is not super clear whether they’re in a competition or not but I suppose if you wanted that more you could specify I want to see bikes in the background. I mean it can just be kind of assumed that they’re in a competition. Either way it captured every single aspect of the prompt. Very very impressive.”
The latest model, Matt showed with examples, did a much better job than the other image generators with hands, background, and even copyrighted materials – including logos and artwork.
He stated that OpenAI has removed all the safety features as the model is apparently not finalized yet, “You can generate blood gore full frontal nudity. It does not care. That’s all allowed. “Completely accurate depictions of celebrity and full out depictions of extreme violence and apparently this extreme violence can just show up in your images even if you don’t ask for it in the prompts so they’ve clearly got some fine tuning and some safety work to do. I don’t know about the viewers but I don’t want blood and gore in my images if I don’t ask for it,” noted the YouTuber, warning OpenAI to be careful before releasing the new version to public.
Sharing an image of box for GTA 5 for PS4 generated using the new model, he said, “This image right here shocked me to my core. It is literally just showing you something that you would see on Google. The Rockstar logo is almost perfect. The Grand Theft Auto 5 logo is near perfect and the rest of the case as well. It looks like the PS4 box of Grand Theft Auto 5. This can give you an idea of how powerful this thing is in terms of reproducing famous imagery. This is some pretty specific imagery.
“It might just crush everything out there. This is insanely good,” concluded Matt.
ChatGPT has truly been a breakthrough product. Despite Dall-E also being a super solid product, OpenAI hasn’t been able to replicate the success of ChatGPT with its image generator. Perhaps Dall-E 3 will change that.
(Here’s the full video.)