Meta has unveiled an innovative approach to video editing that takes a significant leap forward with the introduction of Fairy, a cutting-edge video-to-video synthesis model.
Developed by Meta’s GenAI team, Fairy is engineered to offer a swifter and more temporally consistent performance compared to existing models, heralding a new era of creative freedom in video editing.
Fairy’s capabilities shine through in various practical applications including character/object replacement, artistic stylization, and the generation of long-form videos. Just like other AI editing models, these tools can easily be used through simple prompts such as “in the style of van Gogh,” to complete the task.
【リアルタイムで動画編集が可能に!?】
Metaがテキスト入力だけで動画をリアルタイムに編集できる「Fairy」を発表。
120フレームの映像をたった14秒で生成する衝撃映像!
かつての44倍のスピードになったとのこと。pic.twitter.com/IrrUElK9Vd
— そねち | AIクリエイター (@AIsonesone) January 7, 2024
Achieving visual coherence in video synthesis presents a formidable challenge, given the multitude of potential alterations that can stem from a single prompt. Addressing this challenge head-on, Fairy leverages cross-frame attention, a sophisticated technique that facilitates the maintenance of visual consistency throughout the video.
Meta describes it as: “A mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis.”
Fairy boasts remarkable efficiency in video generation. It is capable of producing 512 x 384 pixel videos comprising 120 frames, equivalent to a duration of 4 seconds at 30 frames per second, all accomplished in a mere 14 seconds. This speed signifies a significant leap, with Fairy surpassing its predecessors by at least 44 times in terms of performance. Building upon the foundation of Meta’s Emu video models, Fairy is underpinned by a diffusion model tailored for image processing, now bolstered and optimized for the demands of video editing.
Better yet, Fairy is capable of altering the source video without involving any unnecessary temporal downsampling and frame interpolation. It also preserves the original aspect ratio but resizes the horizontal size of the output video to 512 pixels. It can create a 27-second video clip in over 70 seconds while working with Nvidia’s A100 GPUs meant for AI processing.
However, as with any other AI model, this one also comes with some limitations. Fairy currently has problems rendering different environmental effects such as rain, fire, lightning, etc. These effects either look out of place in the scene or have visual errors.
The researchers behind Fairy blame this issue on a focus on temporal consistency. This results in dynamic effects like lightning or fire appearing static or stagnant rather than fluid.
Regardless, the researchers believe that Fairy represents a significant step forward in terms of AI video editing software since it can maintain temporal consistency and high-quality video outputs.
Via: The Decoder