NVIDIA Accused Of Scraping Troves Of YouTube’s Copyrighted Data For AI Training

Tech giant NVIDIA has found itself in hot water following a new report accusing it of troubling practices in the tech world.

According to a recent report by 404 Media, NVIDIA allegedly used copyrighted content from YouTube to train its AI models. The report details how the company requested its workforce to download content from popular video streaming sites, including Netflix and YouTube, to develop commercially-themed AI products.

NVIDIA, a multi-trillion-dollar firm, has adopted a “move fast and break things” strategy to claim a top spot in the AI race. This approach placed the company among those embracing rapid development, including training AI models for products such as the Omniverse 3D world generator and various self-driving car systems.

The company defends its practices, with a representative claiming that the research aligns with copyright laws, emphasizing that IP laws are intended for specific expressions, not data, ideas, or facts. They compared this to an individual’s right to learn from various sources and create their own expressions.

However, YouTube does not agree with this justification. Spokesperson Jack Malon pointed out that the company’s CEO previously stated that using YouTube content for AI training violates its laws and terms of service. This aligns with a Bloomberg report from April, which criticized OpenAI for similar practices.

In the past month, Runway AI was also reported to have engaged in similar activities. NVIDIA employees who raised concerns were told that the actions had executive approval, implying prior authorization from YouTube.

The situation bears resemblance to Facebook’s previous approach of rapid, sometimes privacy-invasive maneuvers. Both NVIDIA and Facebook directed their teams to train AI on movie trailers, gaming libraries, and GitHub datasets.

While NVIDIA claims that their use of data was for academic purposes, they reportedly used virtual machines with rotating IP addresses to evade detection by YouTube. Employees shared details of how these machines were used to continuously obtain new IP numbers.

The 404 Media report argues that NVIDIA’s practices likely violate the law. The situation highlights a significant ethical and legal debate about data scraping and intellectual property rights.

The controversy raises questions about the practices of major tech companies and their approach to copyright and data use. What are your thoughts on NVIDIA's actions?

Image: DIW-Aigen

Read next: New Report Confirms Israeli Army Is Using Big Tech’s Help To Commit More War Crimes In Gaza

NVIDIA Accused Of Scraping Troves Of YouTube’s Copyrighted Data For AI Training

Dr. Hura Anwar

You might like