A new report from Proof News is shedding light on AI training carried out by big tech giants using YouTube without consent.
A whopping 173k videos from the popular video-sharing app were taken without the company's knowledge to train AI models, the report highlighted. And the latest dataset features nearly 48k channel scripts that were scrapped directly through the app.
Names of big tech giants accused of the act included Apple, Anthropic, and even NVIDIA amongst many others. So as you can imagine, it’s a big deal. Furthermore, the findings related to this investigation showcase the uncomfortable truth related to the world of AI and how the tech is majorly built using data taken from creators without any permission or compensation.
The dataset does not entail any content from the app but does entail transcripts from some big global creators like MrBeast and giant media powerhouses like NYT, BBC, and ABC. Moreover, a host of subtitles were related to tech media outlets too.
It’s amazing how iPhone maker Apple’s name is also on the list who is believed to have sourced its data for training AI models from a range of firms. Amongst those were YouTube’s scripts. So as you can imagine, it’s going to be a serious issue for many years to come.
Speaking to Engadget, comments from the past were delivered by YouTube’s CEO who warned that training models using the app’s data without consent was a clear violation of its terms of service. And right now, no tech giant accused in the report is replying to comments made on this front.
It’s quite evident from this news that AI firms are not transparent about the information being used for data training of models. During this past month, we saw plenty of artists and photographers slam Apple for not disclosing its source of data training of Apple Intelligence. The latter is the name reserved for their own spin linked to generative AI arising from millions of devices that belong to Apple.
YouTube is undoubtedly a goldmine when it comes to video, audio, and even pictures and therefore is said to be a top source for training purposes.
At the beginning of 2024, the head of OpenAI avoided any queries about whether or not the company used data from YouTube to train its AI models like Sora. Despite further insistence on the subject, they remained hushed and chose to sidetrack the queries, adding how any information used was available to the public.
On that note, Sundar Pichai of Alphabet warned that using YouTube for this purpose was violating its rules.
So as you can imagine, this is going to be one very long and interesting battle and we’re curious to see what steps Alphabet takes against those violating its terms of service.
Image: DIW-Aigen
Read next:
• YouTube Tests New Community Spaces Feature To Encourage More Engagement Via Text Posts
A whopping 173k videos from the popular video-sharing app were taken without the company's knowledge to train AI models, the report highlighted. And the latest dataset features nearly 48k channel scripts that were scrapped directly through the app.
Names of big tech giants accused of the act included Apple, Anthropic, and even NVIDIA amongst many others. So as you can imagine, it’s a big deal. Furthermore, the findings related to this investigation showcase the uncomfortable truth related to the world of AI and how the tech is majorly built using data taken from creators without any permission or compensation.
The dataset does not entail any content from the app but does entail transcripts from some big global creators like MrBeast and giant media powerhouses like NYT, BBC, and ABC. Moreover, a host of subtitles were related to tech media outlets too.
It’s amazing how iPhone maker Apple’s name is also on the list who is believed to have sourced its data for training AI models from a range of firms. Amongst those were YouTube’s scripts. So as you can imagine, it’s going to be a serious issue for many years to come.
Speaking to Engadget, comments from the past were delivered by YouTube’s CEO who warned that training models using the app’s data without consent was a clear violation of its terms of service. And right now, no tech giant accused in the report is replying to comments made on this front.
It’s quite evident from this news that AI firms are not transparent about the information being used for data training of models. During this past month, we saw plenty of artists and photographers slam Apple for not disclosing its source of data training of Apple Intelligence. The latter is the name reserved for their own spin linked to generative AI arising from millions of devices that belong to Apple.
YouTube is undoubtedly a goldmine when it comes to video, audio, and even pictures and therefore is said to be a top source for training purposes.
At the beginning of 2024, the head of OpenAI avoided any queries about whether or not the company used data from YouTube to train its AI models like Sora. Despite further insistence on the subject, they remained hushed and chose to sidetrack the queries, adding how any information used was available to the public.
On that note, Sundar Pichai of Alphabet warned that using YouTube for this purpose was violating its rules.
So as you can imagine, this is going to be one very long and interesting battle and we’re curious to see what steps Alphabet takes against those violating its terms of service.
Image: DIW-Aigen
Read next:
• YouTube Tests New Community Spaces Feature To Encourage More Engagement Via Text Posts