北京时间2月16日凌晨,openai正式发布文本转视频产品sora,效果吊打一众同行公司,同行在山姆・奥特曼的推特下评论:
北京时间2月16日凌晨,openai正式发布文本转视频产品sora,效果吊打一众同行公司,同行在山姆・奥特曼的推特下评论:
初创公司的终结机,这是停不下来了是吗?
和Runway gen2,Pika这些公司的产品相比,sora的表现确实亮眼。可以用碾压来形容。
首先,sora支持单个最长60s长度的视频生成,这在一众同行目前仅提供10-15秒长度生成的背景下令人惊叹,也使得通过ai生成可以用来剪辑长视频的素材成为可能。
Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.
其次,视频中的人物、角色、背景,保持了惊人的稳定性和一致性,和Runway gen2,pika生成的视频相比,sora的视频更自然,行业专家直言不讳:“Sora是我目前看到唯一跳脱出空镜头生成、真正的视频生成工作。”
Prompt: A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood..再次,sora对于视频细节的掌控,令人震惊。比如下面这个sora生成的视频,毛发纹理的渲染,精细到让人惊掉下巴。
要知道当年皮克斯为了创造出《怪兽公司》中怪物在移动时超级复杂的毛发纹理,付出成本数千万计。而现在,sora轻而易举就实现了。
Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.除此之外,sora在摄像机位、拍摄角度、镜头切换的表现,也让人印象深刻。跟随视频的视角,你感觉像是在看一名经验丰富的导演作品,而不是像看其它AI作品时那种生硬感。这种惊艳表现得益于AI对世界的理解,openai自己是这么说的Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps.
Sora是一个扩散模型,它从一个看起来像静态噪声的视频开始生成视频,并通过多次去除噪声来逐渐变换视频。
Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.
Sora是能够理解和模拟真实世界的模型的基础,我们相信这一能力将是实现AGI的重要里程碑。
这和其它大模型生成视频的思路是完全不一样的,有点类似于Sora是按照3D引擎的思路去理解和生成视频,而其它大模型还处在理解图片的阶段。一个是三维的思维和高度,其他人还在二维。其它产品和Sora存在代差。这就是降维打击。
尽管已经足够惊艳,Sora依然不是完美无缺的,生成画面依然存在类似手指缺失、层次错位、方向错误等问题。这取决于AI对世界的理解还不够深入。
另外就是多次生成的内容的一致性目前不可控。这限制了使用Sora制作视频片段最后剪辑成长视频的可能性。相信随着研究的深入,这些问题都能被解决。
但是,这些并不妨碍Sora对抖音这样的短视频平台的玩法的改变,如此丝滑的文本生成视频质量不输于普通人手机拍摄的内容,短视频行业即将迎来新的爆发。创作者可以将更多精力放在创意上。
目前Sora还没开放大范围测试,估计还在对内容的安全性做进一步审查。正式开放后对算力的需求也将到达一个恐怖的点,毕竟openai的用户体量在这里。
openai在文本生成领域占绝对优势后,后续革命性产品不断涌现。这就是优势聚集效应。GPT就是它的核动力心脏,目前优势进一步放大,其它初创公司想要追赶更加不容易了。
来看下openai的产品时间线
2022.11.30 chatgpt3.5,dall-e2上线
2023.3 chatgpt4.0上线
2023.11 chatgpt4.0-1106更新、多模态、dall-e3、gpts上线
2024.2 Sora发布
不知道预告今年发布的gpt5.0又将给我们带来什么样的惊喜。距离真正的AGI越来越近了。真是期待!
欢迎加我交流