Popular online encyclopedia Wikipedia is reportedly paying a major price for the AI boom. The encyclopedia giant is struggling with a rise in costs due to bots scraping content that is used for training AI models.
This is not only a financial constraint but also has to do with a strain across the platform’s bandwidth.
On Tuesday, we saw the nonprofit firm hosting Wikipedia issue a warning about automated requests for its content that keep on growing exponentially. This causes a massive disruption across the website and forces the encyclopedia to add greater capacity and, similarly, increase the billing for data centers.
The infrastructure is created to withstand the rise in traffic from humans during top events, but the traffic levels produced using scraper bots are unpredictable and keep showing up as a rise in costs and higher risk.
The Foundation shared how the bandwidth for downloading content grew 50%. But the traffic here is not arriving from actual humans but automated programs. These keep on installing licensed images to feed pictures to their AI models.
Another serious issue has to do with bots gathering large amounts of data from less famous articles on Wikipedia. Taking a closer look, it was shown that nearly 65% of the traffic arrives through bots. This is an unequal amount when we look at overall pageviews via bots, which make up 35% of the majority.
These bots scrape serious systems in the developer infrastructure, like code review platforms, and that again puts a major strain on the page’s resources. In reply, the online encyclopedia’s site managers impose case-by-case rates that restrict the AI crawlers or that ban them altogether.
To address the issue further, the Wikimedia Foundation says it’s rolling out a more Responsible Use of this Infrastructure plan that identifies the network strain coming from AI bot scrapers that aren’t sustainable.
Wikipedia hopes to get more feedback from the community on how to best tackle this serious issue and identify traffic coming from these bot scrapers and how to filter them out. This will include forcing bot operators to scan through authentication for top volume scrapers and API usage.
Wikipedia knows that it’s a huge threat as their material is free of cost, but the infrastructure isn’t. They have to act now to re-create a healthier balance.
Reddit faced something similar in 2023. Software giant Microsoft, for instance, didn’t alert Reddit about scraping content and using that for AI features. It then blocked Microsoft from scraping its own pages, which Reddit’s CEO openly condemned.
Reddit further decided to take action by charging third-party developers to gain access to their API. This led to the developer to revolt, experience sudden blackouts on the app, and even shut down for some leading clients of the company.
Image: DIW-Aigen
Read next: YouTube in a Position To Become The Leader of Video Streaming in 2025
This is not only a financial constraint but also has to do with a strain across the platform’s bandwidth.
On Tuesday, we saw the nonprofit firm hosting Wikipedia issue a warning about automated requests for its content that keep on growing exponentially. This causes a massive disruption across the website and forces the encyclopedia to add greater capacity and, similarly, increase the billing for data centers.
The infrastructure is created to withstand the rise in traffic from humans during top events, but the traffic levels produced using scraper bots are unpredictable and keep showing up as a rise in costs and higher risk.
The Foundation shared how the bandwidth for downloading content grew 50%. But the traffic here is not arriving from actual humans but automated programs. These keep on installing licensed images to feed pictures to their AI models.
Another serious issue has to do with bots gathering large amounts of data from less famous articles on Wikipedia. Taking a closer look, it was shown that nearly 65% of the traffic arrives through bots. This is an unequal amount when we look at overall pageviews via bots, which make up 35% of the majority.
These bots scrape serious systems in the developer infrastructure, like code review platforms, and that again puts a major strain on the page’s resources. In reply, the online encyclopedia’s site managers impose case-by-case rates that restrict the AI crawlers or that ban them altogether.
To address the issue further, the Wikimedia Foundation says it’s rolling out a more Responsible Use of this Infrastructure plan that identifies the network strain coming from AI bot scrapers that aren’t sustainable.
Wikipedia hopes to get more feedback from the community on how to best tackle this serious issue and identify traffic coming from these bot scrapers and how to filter them out. This will include forcing bot operators to scan through authentication for top volume scrapers and API usage.
Wikipedia knows that it’s a huge threat as their material is free of cost, but the infrastructure isn’t. They have to act now to re-create a healthier balance.
Reddit faced something similar in 2023. Software giant Microsoft, for instance, didn’t alert Reddit about scraping content and using that for AI features. It then blocked Microsoft from scraping its own pages, which Reddit’s CEO openly condemned.
Reddit further decided to take action by charging third-party developers to gain access to their API. This led to the developer to revolt, experience sudden blackouts on the app, and even shut down for some leading clients of the company.
Image: DIW-Aigen
Read next: YouTube in a Position To Become The Leader of Video Streaming in 2025