OpenAI Launches Its Video-Generating AI ‘Sora’ And It Seems Like It Was Trained On Twitch Streams

OpenAI just made headlines for rolling out its own video-generating AI called Sora. However, it appears that this feature might have received training from the likes of Twitch streams which could be a major cause of concern.

The feature was unveiled Monday and since then, expert gamers have taken out time to review it. This includes text prompts and images and the fact that it can churn 20-second-long videos in certain ranges and resolutions.

The feature was first discussed by the AI giant in February and that’s where it discussed training the model on Minecraft content. So many cannot help but wonder what else was used on training sets.

Sora has the capacity to roll out videos of Super Mario Bros and even give rise to gameplay footage from Call of Duty. Furthermore, it has a great understanding of Twitch streams which means it might be quite familiar with it. If that was not enough hint, it also features similarities to a top Twitch streamer named Raul Genes or Auronplay.

The matter is very debatable as OpenAI never shared in detail where it got the training data from. When asked during a high-profile interview with WSJ, the company’s CTO failed to boldly deny that it took assistance from content on YouTube and Meta’s Facebook and Instagram. All they stated was making use of public data and licensed material from stock places such as Shutterstock.

At the start, the company failed to comply with requests made on this front but soon after the story’s publication, its PR rep confirmed it needed to get back after checking with its team. If game content is actually a part of the training process, the AI giant might be in hot water legally. This is true if it has plans to expand into more interactive experiences.

Firms receiving training on the likes of unlicensed footage through video game playthroughs come with huge risks. Training on such material entails copying the training material. If the material is playthroughs for games, it could get so overwhelming that copyrighted content becomes a part of a training set.

Models like this that are based on Generative AI tend to be so probabilistic. They make use of so much training data that they learned from the past to make useful predictions. This is how they recall useful data like how the world works so yes it’s sometimes left with no choice.


Read next: Apple Rolls Out New Updates for iPhone, iPad, and Mac Software Including ChatGPT Integration with Siri
Previous Post Next Post