Block, Allow, or Monetize AI Crawlers on Your Pakistani Website

Last updated: July 2026. By Abdul Rehman, WeProms Digital.

Cloudflare now splits AI traffic into Search, Agent, and Training bots, and 51.8% of AI crawler requests are for training rather than search. Pakistani site owners who blanket-block GPTBot forfeit citations in ChatGPT and Google AI Mode, while those who let every bot through pay the server bill for someone else’s model. The fix is selective rules paired with real measurement.

Start here: the question Pakistani business owners ask in 2026 is no longer whether AI bots visit their site, because they already do in volume. The question is which bots to welcome, which to restrict, and which to bill. Get that decision wrong in either direction and you either vanish from the answers ChatGPT and Google AI Mode give your customers, or you quietly fund a foreign AI lab’s training data on a Karachi hosting plan that cannot afford it.

What exactly are AI crawlers, and why are they suddenly a Pakistani problem?

AI crawlers are automated programs run by companies like OpenAI, Anthropic, Google, and Microsoft that read your web pages to power their AI products. GPTBot belongs to OpenAI and feeds ChatGPT. ClaudeBot belongs to Anthropic and feeds Claude. PerplexityBot feeds Perplexity. Google-Extended feeds Gemini, and BingBot already feeds Microsoft’s new Web IQ index that grounds both Copilot and ChatGPT answers.

The reason this is now a Pakistani problem is scale. DataReportal’s Digital 2025 Pakistan report counts 116 million internet users in the country, and a population-normalized tracking study published on arXiv in late 2025 found that 33% of connected Pakistanis already use AI tools, with roughly 28% using them daily. Gallup Pakistan’s own survey puts ChatGPT ahead of every rival at 66% of Pakistani AI users, followed by Microsoft Copilot at 23% and Google’s Gemini at 17%.

That means your customers are asking ChatGPT for recommendations before they ever type a query into Google. If GPTBot could not read your site, ChatGPT cannot cite you, which means the answer it gives points elsewhere.

Should a Pakistani business block GPTBot and ClaudeBot, or let them in?

The short answer is let the search and agent bots in, and manage the training bots deliberately. Blanket-blocking every AI crawler is the most common mistake we see, and it is usually copied from a viral robots.txt snippet without anyone checking the consequence.

Cloudflare’s June 2026 Radar data, reported by Forbes, shows bots now account for 57.5% of all HTTP requests to HTML content, with humans at 42.5%. Within that bot traffic, AI bots make up 33.8% while traditional search-engine crawlers account for 26.8%. Those AI bots are no longer a rounding error. Blocking them all removes your pages from the fastest-growing answer layer on the internet.

The tradeoff is that not all AI bots repay the visit. A bot that reads your page to generate a live answer and cite you is an asset. A bot that reads your page to train a competing model is a cost. Telling them apart is the entire job.

What is the difference between Search, Agent, and Training bots?

Ready to improve your marketing results?

Book a free strategy call - we'll audit your current setup and identify the highest-impact fixes.

Book Free Call

Cloudflare introduced finer controls in 2026 that let site owners distinguish three bot purposes instead of treating AI traffic as one block. Understanding this split is what turns a guess into a decision.

Bot purposeWhat it doesExample botsShould a Pakistani site allow it?
Search botsFeed an index used to answer queries, often with a citation back to youBingBot (Web IQ), Google-ExtendedYes, allow
Agent botsRead your live page to answer a specific user question right nowGPTBot (ChatGPT fetch), PerplexityBotYes, allow
Training botsScrape your content to improve a model, with no citation returnedVarious bulk scrapers, some training runsRestrict selectively

The picture is starker than the table suggests. Cloudflare’s May 2026 breakdown, summarized by WorkOS, shows that 51.8% of AI crawler requests are for training purposes and only 9.3% are for search. More than half of the AI traffic hitting your server builds someone else’s product and sends you nothing in return.

Infographic: Three AI bot categories split training at 51.8 percent, agent at 38.9 percent, and search at 9.3 percent of requests

So what? If you run a Lahore news site or a Karachi ecommerce store on a shared Cloudways or local hosting plan, those training crawls consume your bandwidth and CPU during peak hours, and the only party that benefits is the AI company. Picture this like running a shop on Liberty Market where you happily let in the Daraz rider collecting an order, but you also unknowingly let in someone photographing your entire stock to rebuild your catalogue at a rival stall. One visit makes you money. The other costs you money.

How do you set up selective AI crawler rules without breaking Google rankings?

The safest path uses your robots.txt file for the broad policy and Cloudflare’s AI bot controls for enforcement, because robots.txt is voluntary and ignores nothing determined to scrape.

First, explicitly allow the bots that cite. GPTBot respects a robots.txt directive, and Anthropic publishes ClaudeBot’s user-agent so you can permit it. BingBot is already the foundation of Microsoft’s Web IQ, which the company says runs at a claimed 164ms P95 latency and now powers grounding for both Copilot and ChatGPT. Blocking BingBot to spite AI would also damage your traditional Bing rankings, which is self-defeating.

Then restrict the training-heavy scrapers through Cloudflare’s Pay Per Crawl or block rules rather than a blanket AI block. Cloudflare reports having blocked over 416 billion AI bot requests since July 2025, so the tooling is proven and widely deployed. The goal is not to stop AI from reading you; it is to stop non-citing bulk training crawls from running up your bill.

One concrete check: confirm that your site is not accidentally blocking Googlebot while tightening AI rules. Googlebot reached 11.6% of unique web pages globally in 2025 versus GPTBot’s 3.6%, according to DigitalApplied’s crawler analysis, and Google’s traditional index still drives the majority of Pakistani commercial traffic. Protecting the citation layer must not starve the ranking layer.

Which AI citations actually send Pakistani businesses traffic or leads?

Citations matter more than crawl counts because a citation is the moment your brand appears in front of a buyer. Microsoft confirmed the split with its own product launch: Bing Webmaster Tools now holds two separate reports, traditional Search Performance and a new AI Performance dashboard, because ranking a page and citing a passage turned out to be different jobs.

Bing’s AI Performance report, which entered public preview in February 2026 and added a Citation Share metric in June 2026, shows how often your pages are cited as grounding on Microsoft surfaces. Google rolled out a Generative AI report inside Search Console that surfaces impressions from its AI features. Neither platform gives you the full cross-engine picture, but together they cover the two largest answer engines a Pakistani buyer is likely to use.

The reading that actually tells you where you stand is the one that spans surfaces rather than living inside a single platform. As Duane Forrester wrote in Search Engine Journal about the rank-to-citation delta: “The company that owns a major web index just documented, in its own product, that ranking a page and citing a passage are different jobs.”

How much does unmanaged AI crawler traffic cost a Pakistani site?

See this in action

How we helped a Pakistani business achieve measurable results.

Read case study

The cost is real and it compounds quietly. GPTBot’s share of AI bot HTTP requests grew from 4.7% in July 2024 to 11.7% in July 2025, and ClaudeBot climbed from roughly 6% to nearly 10% over the same window, per Cloudflare’s crawl-to-click analysis. Traffic that was negligible two years ago is now a meaningful line item.

Infographic: GPTBot share grew from 4.7 percent in July 2024 to 11.7 percent in July 2025 while ClaudeBot reached 10 percent

For a mid-sized Pakistani ecommerce site doing a few hundred thousand page views a month, uncontrolled training crawls can add measurable bandwidth and CPU overhead on entry-level hosting, and during a peak like a Ramadan sale or a 11.11 event they compete with real shoppers for server resources. The fix is rarely to upgrade hardware; it is to stop serving scrapers that never cite you.

This is also where monetization enters the conversation. Cloudflare’s Pay Per Crawl framework lets publishers charge AI companies for crawl access, which is a genuine option for Pakistani publishers with original reporting or proprietary product data that AI engines want. Most SMEs will not reach that threshold, but any business sitting on a proprietary dataset should at least know the option exists before giving it away.

How do you track whether blocking or allowing a bot changed your AI citations?

Measurement is what separates a decision from a hope. The practical sequence is to baseline first, change one rule at a time, and watch the citation reports rather than the crawl logs.

Set up Bing Webmaster Tools and watch the AI Performance report for Citation Share movement after you allow BingBot fully. Enable the Generative AI report in Google Search Console to see whether your pages surface in Google’s AI answers. For ChatGPT specifically, run a fixed set of buyer-intent prompts monthly, such as “best SEO agency in Lahore” or “where to buy [your product] in Karachi,” and record whether your brand appears and whether it links to you. You can read more on why Pakistani brands struggle to track this in our field note on AI search visibility.

The signal you are looking for is direction over a 60 to 90 day window, not a daily jump. If you allow the citing bots and Citation Share rises while your hosting overhead stays flat, the decision worked. If you blocked a bot and your citations dropped, you blocked the wrong bot. For a deeper read on separating real visitors from automated traffic in your analytics, see our note on why bot traffic beats human traffic in Pakistani GA4 setups and our warning about AI visibility tools that waste Pakistani SME budgets.

The decision a Pakistani site owner actually needs to make

The work reduces to three moves. Allow the bots that cite you and feed the answer indexes, because that is how 33% of connected Pakistanis now discover businesses. Restrict the training bots that take without returning a citation, because they cost you money and build a competitor. Then measure citation share across Bing, Google, and ChatGPT so you know the rules are working rather than guessing.

Read next: our guide to auditing AI Mode traffic loss for Pakistani SMEs and our teardown of agentic SEO services for Pakistani SMEs.

At WeProms Digital, we run this as a technical SEO audit that maps every AI bot hitting your domain, classifies it as Search, Agent, or Training, writes the robots.txt and Cloudflare rules, and sets up the citation tracking that shows whether your brand appears inside the answers your customers actually read. If you want that mapped on your site, reach us at hello@weproms.com or on WhatsApp at +92 300 0133399, or start at weproms.com/contact-us.

Sources & References

  1. Forbes — Bots Now Outnumber Humans Online — June 2026
  2. Cloudflare Blog — The crawl-to-click gap: AI bots, training, and clicks — 2026
  3. Cloudflare Blog — From Googlebot to GPTBot: who’s crawling your site in 2025 — 2025
  4. WorkOS — AI agent web traffic: what developers need to change — 2026
  5. DigitalApplied — AI Crawler & Bot Traffic Statistics 2026 — 2026
  6. DataReportal — Digital 2025: Pakistan — February 2025
  7. Gallup Pakistan — AI chatbot usage in Pakistan — 2025
  8. Search Engine Journal — Duane Forrester on the rank-to-citation delta and Microsoft Web IQ — 2026
  9. arXiv — A Population-Normalized Metric for Tracking Global AI Usage — November 2025

Additional reading from industry feeds: