Now Reading
Inside the Rise of Websites Blocking Google-Extended Crawling

Inside the Rise of Websites Blocking Google-Extended Crawling

Avatar photo
Sites Blocking Google

In recent months, there has been a significant increase in the number of websites blocking Google-Extended crawling. This move comes as brands and businesses express concerns about their content being used by AI companies to generate profit and compete against them. According to research shared exclusively with Search Engine Land by the Detailed.com team, the number of websites blocking Google-Extended has jumped by 180% in the past month alone. In this article, we will explore the reasons behind this trend, the websites that have implemented the block, and the implications for content creators and AI companies.

The Debate Surrounding Blocking Bots

The decision to block bots that crawl content used to train Language Model Models (LLMs) has sparked much debate and discussion. While only a minority of sites have opted to block these bots, the numbers are steadily increasing. Content creators are concerned that their content is being used by AI companies to gain a competitive edge, which could potentially undermine their own businesses. This has led to a growing desire to protect their content and ensure it is not exploited by AI companies.

The Growth of Websites Blocking Google-Extended Crawling

As of November 19, a total of 252 websites out of a set of 3,000 popular websites had blocked Google-Extended crawling. This represents a significant increase from just over a month earlier when only 89 sites had implemented the block. The 180% jump in the number of websites blocking Google-Extended crawling highlights the increasing concerns among content creators.

Notable Websites Blocking Google-Extended

Several prominent websites have joined the movement to block Google-Extended crawling. These websites include:

  1. Ziff Davis properties: Ziff Davis, the parent company of popular websites such as PC Mag and Mashable, has implemented the block.
  2. Vox properties: Websites under the Vox umbrella, including The Verge and NYMag, have also chosen to block Google-Extended crawling.
  3. The New York Times: One of the most respected news organizations, The New York Times, has taken steps to prevent Google-Extended from accessing its content.
  4. Condé Nast: Condé Nast, the publisher of renowned magazines such as GQ, Vogue, and Wired, has blocked Google-Extended crawling on 22 of its sites.
  5. Yelp: A frequent critic and legal opponent of Google, Yelp has also decided to block Google-Extended crawling.

Opting Out of Google-Extended Crawling

While it is possible to block Google-Extended in the robots.txt file, this does not prevent content from appearing in Google’s Search Generative Experience (SGE) or being used to train SGE. To fully opt out, blocking Googlebot is necessary, but this action also removes the website from appearing in Google Search. However, there is an alternative option to opt out of SGE overviews using the nosnippet tag.

Implications for Content Creators and AI Companies

The rise in websites blocking Google-Extended crawling has significant implications for both content creators and AI companies. For content creators, it provides a sense of control over their content and safeguards against potential exploitation. By blocking Google-Extended, they can limit the use of their content by AI companies and protect their own businesses.

On the other hand, AI companies may face challenges in accessing sufficient training data if more websites continue to block Google-Extended. This could hinder the development and improvement of AI models that rely on a diverse range of data sources. AI companies may need to find alternative methods for acquiring training data or negotiate partnerships with content creators to gain access to their content.

See first source: Search Engine Land

FAQ

1. Why are websites blocking Google-Extended crawling?

Websites are blocking Google-Extended crawling due to concerns that their content is being used by AI companies to generate profit and compete against them. Content creators are worried about the potential exploitation of their content by AI entities.

2. How much has the number of websites blocking Google-Extended crawling increased recently?

The number of websites blocking Google-Extended crawling has increased by 180% in the past month, according to research by the Detailed.com team. As of November 19, 252 out of a set of 3,000 popular websites had implemented the block.

See Also
Search Tools

3. Which notable websites have implemented the block on Google-Extended crawling?

Several prominent websites that have implemented the block include Ziff Davis properties (e.g., PC Mag and Mashable), Vox properties (e.g., The Verge and NYMag), The New York Times, Condé Nast (e.g., GQ, Vogue, and Wired), and Yelp. These websites have taken steps to prevent Google-Extended from accessing their content.

4. How can websites block Google-Extended crawling?

Websites can block Google-Extended crawling by making changes in their robots.txt file. However, it’s important to note that blocking Google-Extended in this way does not prevent content from appearing in Google’s Search Generative Experience (SGE) or being used to train SGE. To fully opt out, blocking Googlebot is necessary, but this action also removes the website from appearing in Google Search. An alternative option is to opt out of SGE overviews using the nosnippet tag.

5. What are the implications of websites blocking Google-Extended crawling for content creators and AI companies?

For content creators, blocking Google-Extended provides control over their content and safeguards against potential exploitation by AI companies. They can limit the use of their content and protect their businesses. However, AI companies may face challenges in accessing sufficient training data if more websites continue to block Google-Extended, which could hinder the development and improvement of AI models that rely on diverse data sources. AI companies may need to find alternative methods for acquiring training data or negotiate partnerships with content creators to gain access to their content.

Featured Image Credit: Photo by Mika Baumeister; Unsplash – Thank you!

Scroll To Top