Stability AI, the company behind the popular text-to-image AI model, Stability Diffusion, has just announced the AI tools’s advancement into generative video. The company’s new product called Stable Video Diffusion, can create an animation from a still image.
The company wrote: “This state-of-the-art generative AI video model represents a significant step in our journey toward creating models for everyone of every type.”
The recently launched tool comprises two image-to-video models, each capable of generating videos ranging from 14 to 25 frames. These videos can be produced at speeds ranging from 3 to 30 frames per second, all while maintaining a resolution of 576 × 1024.
The tool demonstrates the ability for multi-view synthesis from a single frame, with the added feature of fine-tuning on multi-view datasets. The video below shows the tool in action.
The sample video shows high-quality results, though as with all other AI models, this one also comes with certain limitations. For instance, it can only generate videos no longer than 4 seconds, does not have perfect photorealism, is only capable of slow camera pans, has no text control, cannot create legible text, and may or may not generate people or faces properly.
Regardless, Stability AI claims that its latest image-to-video model is more capable than rival offerings such as Runway and Pika Labs.
“At the time of release in their foundational form, through external evaluation, we have found these models surpass the leading closed models in user preference studies.”
Stability Video Diffusion is not available for commercial use as of yet since it is limited to research purposes, however, prospective users have the option to sign up and join a waitlist for access to an “upcoming web experience featuring a text-to-video interface,” as stated by Stability AI.
The tool is designed to highlight potential applications across various sectors, including advertising, education, entertainment, and more.
The tool underwent training on a dataset comprising millions of videos, followed by fine-tuning on a smaller set. Stability AI has mentioned using publicly available research-oriented video data, though the specific details about the dataset’s origin are noteworthy.
This significance arises particularly in light of Stability AI’s recent legal dispute with Getty Images, involving the scraping of its image archives.