Stable Diffusion, the leading open-source AI image generation model by Stability AI, has just received a successor called Stable Cascade, which not only claims to be more powerful, but faster too. It is expected to be adopted by many text-to-image AI models that already use Stable Diffusion as their base.

Stable Cascade boasts features such as being able to alter images it has already created or attempting to increase the resolution of an existing image. It offers inpainting and outpainting features too which are found in several AI image generator rivals too. This allows the model to fill edit a specific part of an image, known as generative fill in Adobe Photoshop. It can also create new photos by utilizing only the outlines of an existing picture with a feature called Canny Edge.

Four AI-rendered images of a penguin in a cafe.

Since it is an open-source model, it is available for researchers to use on GitHub but not for commercial use. It brings more options while Google and Apple also release their own image generation models.

Stable Cascade uses three different large language models based on the Würstchen architecture unlike Stable Diffusion, which only works on a single language model. Behind the scenes, the process begins with stage C, which condenses text prompts into latent—compact code snippets. These are subsequently forwarded to stages A and B, which work together to decode and fulfill the request.

Fragmenting requests into smaller components not only minimize the memory requirement (and the extensive training hours on scarce GPUs) but also enhances the system’s efficiency, achieving superior results in terms of both prompt accuracy and visual appeal. This reduces the time taken to generate an image significantly to only 10 seconds. In contrast, the current SDXL model takes about 22 seconds to create an image.

The stable diffusion technique was pioneered by Stability AI, but it has also faced its fair share of hurdles including a lawsuit from Getty Images in the UK for training its AI models on copyrighted data without permission. Some have also pointed out that the training data used for Stable Diffusion contained thousands of child abuse images.