OpenAI’s new — and first! — video-generating model, Sora, can pull off some genuinely impressive cinematographic feats. But the model’s even more capable than OpenAI initially made it out to be, at least judging by a technical paper published this evening.
The paper, titled “Video generation models as world simulators,” co-authored by a host of OpenAI researchers, peels back the curtains on key aspects of Sora’s architecture — for instance revealing that Sora can generate videos of an arbitrary resolution and aspect ratio (up to 1080p). Per the paper, Sora’s able to perform a range of image and video editing tasks, from creating looping videos to extending videos forwards or backwards in time to changing the background in an existing video.
But most intriguing to this writer is Sora’s ability to “simulate digital worlds,” as the OpenAI co-authors put it. In an experiment, OpenAI set Sora loose on Minecraft and had it render the world — and its dynamics, including physics — while simultaneously controlling the player.
Sora controlling a player in Minecraft — and rendering the video game world as it does so. Note that the graininess was introduced by a video-to-GIF converter tool, not Sora. Image Credits: OpenAI
So how’s Sora able to do this? Well, as observed by senior Nvidia researcher Jim Fan (via Quartz), Sora’s more of a “data-driven physics engine” than a creative too. It’s not just generating a single photo or video, but determining the physics…
Please check out the original article at Source link