ChatGPT maker OpenAI introduces Sora text-to-video gen AI model

OpenAI's Sora is capable of generating entire videos all at once or extending generated videos to make them longer from text or a still image.

Published By: Shubham Verma | Published: Feb 16, 2024, 03:57 PM (IST)

  • whatsapp
  • twitter
  • facebook

Highlights

  • OpenAI has unveiled its new text-to-video generator called Sora.
  • Sora can even generate a photorealistic video from a still photo.
  • The new generator is available to red teamers for now.
  • whatsapp
  • twitter
  • facebook

ChatGPT maker OpenAI has unveiled Sora, a new text-to-video model that can generate videos up to a minute long, while maintaining visual quality and adherence to the user’s prompt. Sora is a diffusion model that generates a video by starting with one that looks like a static noise and gradually transforms it by removing the noise over many steps. While Sora is not available to the public yet, OpenAI shared a few videos that Sora generated and they look super realistic, so much so that people are borderline scared of what a generative AI-driven future looks like. news Also Read: Google Gemini’s Retro AI Portraits Are Going Viral: How To Try The Trend Yourself Using Nano Banana

“Sora is capable of generating entire videos all at once or extending generated videos to make them longer,” the company said. In addition to being able to generate a video solely from text instructions, the model is also able to take an existing still image and generate a video from it, “animating the image’s contents with accuracy and attention to small detail.” news Also Read: Google Gemini’s Nano Banana Feature Helps It Beat ChatGPT To Rank At The Top Of iOS App Store

Similar to GPT models, Sora uses a transformer architecture, unlocking superior scaling performance. The company said it is making Sora available to “red teamers (domain experts) to assess critical areas for harms or risks.” news Also Read: How To Create Instagram's Viral Vintage Saree AI Photo With ChatGPT And Google's Gemini Nano Banana: Prompt Here

“We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals,” OpenAI said in a statement.

Sora will be able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” according to OpenAI. Sora can also create multiple shots within a single generated video that accurately portrays characters and visual style.

The company, however, admitted that the current model has its weaknesses. “It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark,” explained OpenAI.

The company further said that it will take important safety steps ahead of making Sora available in OpenAI’s products. “We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model. We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora,” the company explained.

— Written with inputs from IANS