How Cloudflare’s move to block AI crawlers changes the game for content creators – and could scupper synthetic research
Cloudflare's move to block AI crawlers by default has massive implications, says Melbourne Business School Associate Professor Nico Neumann. The winners? Content creators. The likely losers? Those relying on synthetic research. The big unknown? How publishers price the cost of a crawl – because a single crawl can generate outputs seen by millions.
On July 1st, infrastructure provider Cloudflare dropped a bombshell: all AI crawlers without explicit permission will now be blocked by default on websites using its services. This was one of the most significant announcements for the media ecosystem since ChatGPT became publicly accessible. Let’s review why this matters.
What do AI crawlers do? And what is the problem?
AI crawlers (sometimes called bots or scrapers) scan websites across the internet to collect data that’s then used to train large language models like ChatGPT or Gemini. This includes text, images, code, and other content. One increasing concern among content providers is that the rapid rise of generative AI tools over the past two years has led to a decline in traffic to their sites, as users increasingly receive answers directly from AI systems that bypass and fail to compensate the original sources.
Who is Cloudflare? And why is the announcement so crucial?
Cloudflare runs a Content Delivery Network (CDN), a system that helps websites load faster and stay secure. About half of the top 1 million websites globally use some form of CDN technology. The proportion is even higher among the top 100,000 websites.
Most importantly, according to various sources, Cloudflare holds the largest market share in this space, estimated around 30%–60% (depending on source, way of counting and sample of websites we use) making it the market leader. Other notable key players are Amazon Google Cloud CDN, Akamai, Amazon CloudFront, Microsoft Azure CDN, and Fastly.
Now Cloudflare has taken the strongest stand by making blocking a default. However, they are not completely alone here in seeing this as an important future role of CDNs. Fastly introduced bot management options that also allow blocking AI crawlers back in April.
Many of the other CDNs offer firewall and bot management features too. Of course, the extent to which they may promote or further introduce such functions may also depend on any conflicting investments in their own GenAI solutions that require AI crawling.
Have websites not tried to block AI crawlers before?
Yes, publishers have tried blocking AI crawlers. By the end of 2023, 48% of the most widely used news websites across ten countries were already blocking OpenAI’s crawlers, according to research by the Reuters Institute for the Study of Journalism and Oxford University.
However, before recent CDN-level enforcement options, the primary method for blocking AI crawlers was through the website’s robots.txt file. A robots.txt file gives instructions to crawlers on what they are or aren’t allowed to access. While simple and easy, this method relies on the bots following the rules. It is not enforceable.
In contrast, network-level blocking that has now been made available by CDNs and the default by Cloudflare is far more effective. Because CDNs sit at a critical point in the internet’s infrastructure (between users and websites), they’re uniquely positioned to enforce such restrictions. Think of it as Apple’s App Tracking Transparency (ATT) for the open web: a default barrier that can drastically reshape who gets access to what.
What are the consequences for the media landscape?
The likely losers: Companies relying on synthetic research
Synthetic research relies on Gen AI tools to summarise insights from web data. If more sites become inaccessible to AI crawlers, the freshness and reliability of that research declines. Research on trends or recent topics becomes particularly problematic.
The winners: content creators
Yes, this is a major shift in power. With the new blocking efforts by CDNs, website owners can effectively control whether their content is scraped by AI crawlers. That's why so many publishers have publicly supported Cloudflare’s press release.
And it’s not just about protecting and regaining traffic. For example, both Cloudflare and Fastly have introduced pay-per-crawl options, enabling content owners to charge AI crawlers for access. So this could be a promising new monetisation stream.
The only challenge: How do you price something like this?
Unlike a human reading an article once, an AI crawler might use that content to generate outputs seen by thousands or even millions. A single crawl isn’t a one-to-one interaction. It’s a one-to-many amplification.
How publishers respond will be fascinating to watch.
Sources:
[1] https://trends.builtwith.com/CDN/Content-Delivery-Network
[2] Best Content Delivery Network (CDN) Software in 2025 | 6sense
[4] https://www.fastly.com/blog/take-back-control-make-ai-bots-play-by-your-rules
[5] https://reutersinstitute.politics.ox.ac.uk/how-many-news-websites-block-ai-crawlers#header--0
[6] https://www.fastly.com/blog/how-to-control-and-monetize-ai-bot-traffic-using-fastly-and-tollbit