How to track AI bot traffic on your site (without a third-party pixel)
If you're not measuring AI crawler traffic, you're flying blind on a channel that grew from negligible to meaningful in 18 months. Tracking AI bots in your existing stack is straightforward — you just need to know what user-agents to look for and what counts.
The user-agents to count
- Training: GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, CCBot, Meta-ExternalAgent, Bytespider, cohere-ai.
- Answer-time: ChatGPT-User, OAI-SearchBot, Claude-Web, PerplexityBot, Meta-ExternalFetcher, DuckAssistBot, MistralAI-User.
- Research: GoogleOther, Diffbot, anthropic-ai.
Three places to look
1. Server access logs
The cleanest source. Every request to your server has a user-agent — grep your logs for the bot names. Aggregate by bot per day. This gives you the absolute count, which beats sampled analytics for bot traffic.
# Nginx access log — last 7 days of GPTBot hits
zgrep -h "GPTBot" /var/log/nginx/access.log* | wc -l
# All AI bots in one pass
zgrep -h -E "GPTBot|ClaudeBot|PerplexityBot|Google-Extended|Applebot-Extended|Bytespider|CCBot|ChatGPT-User" /var/log/nginx/access.log* | \
awk '{print $14}' | sort | uniq -c | sort -rn2. Cloudflare / fronting CDN
If you're on Cloudflare, the bot management dashboard shows AI crawler hits with built-in classification. Other CDNs (Fastly, AWS CloudFront with Lambda@Edge, Vercel Edge) can do the same with a small middleware function.
3. Analytics (with a custom event)
GA4 and PostHog filter out bots by default — useful for human metrics, harmful for AI bot tracking. Add a server-side event when a known AI UA hits your origin and forward it as a custom analytics event tagged "ai_bot_visit" with a bot name property.
// Next.js middleware example
import { NextResponse } from "next/server";
const AI_BOTS = [/GPTBot/i, /ClaudeBot/i, /PerplexityBot/i, /Google-Extended/i, /Applebot-Extended/i, /Bytespider/i, /CCBot/i, /ChatGPT-User/i];
export function middleware(req) {
const ua = req.headers.get("user-agent") || "";
const matched = AI_BOTS.find((re) => re.test(ua));
if (matched) {
fetch("https://your-analytics.example.com/ai-bot-hit", {
method: "POST",
body: JSON.stringify({ bot: matched.source, path: req.nextUrl.pathname, ts: Date.now() }),
});
}
return NextResponse.next();
}What to actually measure
- Total AI bot hits per day, by bot.
- Top URLs each bot crawls — tells you what your highest-AI-visibility content is.
- URLs no bot has hit — your AI-invisibility list.
- Trend: if a bot's hit rate drops from 1000/day to 10/day, something on your site started blocking it or returning errors to it.
The blind spot
Counting bot hits tells you what's getting crawled. It doesn't tell you what's getting cited in user-facing AI answers. Those are correlated but not identical — a URL can be crawled and then ignored in answer synthesis. The fully-instrumented version requires both: bot tracking + citation tracking (manually checking ChatGPT / Perplexity / Claude.ai for your URL).
Why this matters operationally
- Detect blocks. If GPTBot was hitting you 500x/day and dropped to 0, you accidentally blocked it.
- Measure new-content uptake. Did the AI bots crawl your new post within a week of publish?
- Audit robots.txt drift. New page templates accidentally blocking bots show up as drops in crawl counts.
- Identify high-leverage content. Pages bots hit repeatedly are pages they cite. Improve those.
Counting AI bot hits is descriptive — it tells you what's happening. Run our AI bot auditor on your domain to check the prescriptive layer — what your robots.txt and llms.txt are telling them to do.
Audit which AI crawlers your robots.txt allows, then watch them in your logs. Both checks, both free.
Audit my AI bot policy