OpenAI’s push into a variety of different services has been aided in no small part by ChatGPT. The AI chatbot has transformed how people think about AI in general, and its latest attempts to capitalize on its newfound success involves the use of a webcrawler called GPTBot. This webcrawler will increase ChatGPT’s dataset thereby making its responses more accurate than might have been the case otherwise, but it turns out that many websites are blocking it.
With all of that having been said and now out of the way, it is important to note that 69 of the top 1000 websites in the world have no blocked GPTBot. That’s just under 7%, and if we were to zero in on the top hundred websites, this proportion goes all the way up to 15%. Websites are doing this because of the fact that this is the sort of thing that could potentially end up preventing GPT from scraping their content without their knowledge.
Among these websites include heavy hitters such as Amazon, Quora, Shutterstock, the New York Times and CNN with all things having been considered and taken into account. These are just some of the websites that have blocked GPTBot so far, and their numbers are increasing by 5% on a weekly basis.
Another web crawling bot that is getting blocked is CCBot, which is the web crawler launched by Common Crawl. ChatGPT and OpenAI rely on this crawler as well in order to harvest data for its systems which can be used to train its AI much more effectively, and analysis has shown that 62 of the top 1000 websites on the internet have blocked it so far.
This does not bode well for the future of the industry, since companies specializing in AI will be reliant on these datasets. Most website owners don’t want their data getting scraped, and it will be interesting to see where things go from here on out. OpenAI may be forced to start purchasing data instead of just scraping it which would result in them losing out on a lot of revenue.
H/T: OriginalityAI
Read next: The Creator Economy Might Be Booming But New Study Says 51% Of Them Are Making Less Than $500 Each Month
With all of that having been said and now out of the way, it is important to note that 69 of the top 1000 websites in the world have no blocked GPTBot. That’s just under 7%, and if we were to zero in on the top hundred websites, this proportion goes all the way up to 15%. Websites are doing this because of the fact that this is the sort of thing that could potentially end up preventing GPT from scraping their content without their knowledge.
Among these websites include heavy hitters such as Amazon, Quora, Shutterstock, the New York Times and CNN with all things having been considered and taken into account. These are just some of the websites that have blocked GPTBot so far, and their numbers are increasing by 5% on a weekly basis.
Another web crawling bot that is getting blocked is CCBot, which is the web crawler launched by Common Crawl. ChatGPT and OpenAI rely on this crawler as well in order to harvest data for its systems which can be used to train its AI much more effectively, and analysis has shown that 62 of the top 1000 websites on the internet have blocked it so far.
This does not bode well for the future of the industry, since companies specializing in AI will be reliant on these datasets. Most website owners don’t want their data getting scraped, and it will be interesting to see where things go from here on out. OpenAI may be forced to start purchasing data instead of just scraping it which would result in them losing out on a lot of revenue.
H/T: OriginalityAI
Read next: The Creator Economy Might Be Booming But New Study Says 51% Of Them Are Making Less Than $500 Each Month