In order for OpenAI to be able to provide a decent service through its flagship service ChatGPT, it needs to scrape a vast quantity of data from across the internet. This is done through the use of a web crawler known as GPTBot, but in spite of the fact that this is the case, it turns out that over a quarter of the top 100 websites in the world have now blocked the bot from being able to scrape their data.
To be more specific, 26 of these top 100 websites have now shut their doors to GPTBot, thereby making it harder for OpenAI to get its hands on the data it requires than might have been the case otherwise. If we were to widen the scope to the top 1,000 websites, 242 of them have made the decision to bar GPTBot entirely. This means that this proportion is roughly the same irrespective of many websites are added to the equation.
Just a month ago, only 69 of the top 1,000 websites had ended up making this drastic decision with all things having been considered and taken into account. This indicates that there has been a whopping 250% increase in the number of websites that are no longer willing to comply. GPTBot is also getting blocked at a much higher proportion than other scrapers such as CCBot and Anthropic AI.
With all of that having been said and now out of the way, it is important to note that some of the biggest brands in the world are part of this list. These include the social media platform Pinterest, news websites belonging to The Guardian, USA Today, the Washington Post and CBS News, along with popular sites like Web MD and dictionary.com.
Websites are doing this due to how ChatGPT does not provide any references or sources for the information it provides. This can be harmful because of the fact that this is the sort of thing that could potentially end up denying these websites the attributions that they require for the information they are creating all on their own.
H/T: OriginalityAI
Read next: New Study Reveals How to Break into The Top Ten on Google’s SERP
To be more specific, 26 of these top 100 websites have now shut their doors to GPTBot, thereby making it harder for OpenAI to get its hands on the data it requires than might have been the case otherwise. If we were to widen the scope to the top 1,000 websites, 242 of them have made the decision to bar GPTBot entirely. This means that this proportion is roughly the same irrespective of many websites are added to the equation.
Just a month ago, only 69 of the top 1,000 websites had ended up making this drastic decision with all things having been considered and taken into account. This indicates that there has been a whopping 250% increase in the number of websites that are no longer willing to comply. GPTBot is also getting blocked at a much higher proportion than other scrapers such as CCBot and Anthropic AI.
With all of that having been said and now out of the way, it is important to note that some of the biggest brands in the world are part of this list. These include the social media platform Pinterest, news websites belonging to The Guardian, USA Today, the Washington Post and CBS News, along with popular sites like Web MD and dictionary.com.
Websites are doing this due to how ChatGPT does not provide any references or sources for the information it provides. This can be harmful because of the fact that this is the sort of thing that could potentially end up denying these websites the attributions that they require for the information they are creating all on their own.
H/T: OriginalityAI
Read next: New Study Reveals How to Break into The Top Ten on Google’s SERP