OpenAI announces new text to video model called Sora

OpenAI’s ChatGPT has been limited to image generation powered by DALL·E 3 for a while, but this may change soon as the company has announced a new text-to-video model called Sora. This model is able to generate minute-long photo-realistic videos that adhere to use prompts.

OpenAI claims that Sora can create “highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.” The announcement tweet from OpenAI’s official X account shows a sample video created by Sora. It also shows the prompt used to create the footage.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

— OpenAI (@OpenAI) February 15, 2024

More sample videos can be found in OpenAI’s blog post.

How It Works

Sora operates as a diffusion model, initiating video creation with a sequence resembling static noise and meticulously refining it by eliminating the noise through numerous iterations. This model boasts the ability to craft complete videos in a single process or enhance existing videos by extending their length.

Moreover, Sora possesses the remarkable capability to produce videos directly from textual prompts or transform a static image into a dynamic video by animating the details within the image with precision and meticulousness. Additionally, it can expand an existing video’s duration or interpolate missing frames, further showcasing its versatility.

Safety Measures

OpenAI assures users that it will be taking several steps for safety before launching Sora into its products. This confirms that Sora will indeed launch as a product, but it remains to be seen if it will be a part of ChatGPT. The company says: “We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who are adversarially testing the model.”

OpenAI will also be building tools that will help detect misleading content online. An example of this would be a detection classifier that will be able to tell whether a video has been created by Sora. To that end, all content created by Sora will include C2PA Metadata to help identify its content.

Additionally, safety methods that are already at work with DALL·E 3 will apply to Sora as well. This means Sora will be able to reject prompts that go against OpenAI’s policies and more.