SEO + AI Lab

Monitor AI Citations with Crawl4AI and Claude Code: GEO Tracking Guide

Claudio Novaglio
10 min read
GEO monitoring con Crawl4AI e Claude Code - monitorare citazioni AI

75 million daily active users on Google AI Mode. Up to 48% of US searches return an AI answer before organic results. But no native tool tells you if your site gets cited.

Google Search Console doesn't track citations in AI Overviews. Google Analytics logs the click but doesn't know if it came from a generative answer or a traditional organic result. Commercial GEO monitoring tools start at $29/month and still require repeated samples to account for AI response non-determinism.

I built an alternative pipeline: DataForSEO for SERP data and AI mentions, Crawl4AI for analyzing cited content, and Claude Code as the orchestrator linking everything via MCP. Not a finished product, but a working system that monitors where and how my site appears in Google's AI responses.

The topic is GEO SEO—Generative Engine Optimization: optimizing content not just for traditional organic results but for being cited in AI-generated responses.

This article closes the GEO trilogy that started with the guide to GEO, AI Overviews, and content strategy 2026 (strategy) and continued with

GEO keyword research and query fan-out (research). Here we move to monitoring: measuring the results of what you've optimized.

The problem: monitoring visibility in Google AI Mode

In traditional SEO, monitoring ranking is straightforward. Tools like Semrush, Ahrefs, or Search Console itself tell you exactly where you rank for each keyword. With AI responses, the situation is different for three structural reasons.

1. AI responses are non-deterministic

Ask Google AI Mode the same question five times in a row and you get five different answers. The cited sources change, the order changes, the text changes. According to various analyses, the overlap between sources cited in repeated answers to the same query is below 10%. A significant share of the domains cited by AI platforms rotates week to week, especially outside top brands.

This means a single manual check has no statistical value. If Google AI Mode cites your site today for "SEO consultant Rome," it might not tomorrow. You need repeated sampling to find the "stable mode" of the distribution: the answer that emerges most often.

2. Google doesn't expose AI citation data

Search Console doesn't distinguish between a click from a traditional organic result and a click from an AI Overview. The referral parameter is the same: google.com/search. There's no "AI Overviews" filter in performance reports. Google Analytics 4 logs traffic but not the specific source.

The missing data: for which queries does your site appear in the AI response? In what position? How often compared to competitors? How does it change over time? None of this is available natively.

3. Manual monitoring doesn't scale

You can open Google AI Mode, type 10 keywords, note if your site appears. But with response variability, you'd need to do it 5–10 times per keyword for a statistically useful sample. For 50 target keywords, that's 250–500 manual queries. Every week.

Commercial solutions: what they do and cost

Before building anything custom, I evaluated the alternatives.

ToolPricePlatforms monitoredNotes
Otterly AIFrom $29/monthGoogle AIO, Perplexity, ChatGPTEntry-level, 20,000+ users
RankscaleFrom $20/monthChatGPT, Claude, Perplexity, AIOCredit-based, most affordable
Semrush AI ToolkitFrom $99/monthChatGPT, AI Mode, Gemini, Perplexity239M+ prompts in database, AI crawler audit
Ahrefs Brand RadarFrom ~$199/monthBrand mentions in LLMsIntegrated into Ahrefs plans
ProfoundFrom $499/month10+ AI enginesEnterprise, compliance, hallucination detection

For a freelancer or SMB, Semrush AI Toolkit at $99/month is the most complete. But it adds to existing SEO subscriptions. And importantly: all these tools return trend indicators, not absolute truth, precisely because of the non-determinism problem.

The question becomes: can I build something that gives me at least a baseline indication, at almost zero cost?

The solution: a DataForSEO + Crawl4AI + Claude Code pipeline

What is Crawl4AI

Crawl4AI is an open-source crawler (Apache 2.0) designed specifically to produce output readable by AI models. In one year it exceeded 63,000 stars on GitHub, making it one of the fastest-growing Python open-source projects of 2025–2026.

Compared to BeautifulSoup or Scrapy, Crawl4AI natively integrates JavaScript rendering via Playwright, automatic conversion to clean Markdown, and three extraction strategies: CSS/XPath (no LLM cost), LLM-based (any model), and adaptive crawling with confidence scoring. Practically, you give it a URL and it returns structured Markdown ready for LLM analysis.

Why Crawl4AI alone isn't enough

Important clarification: Crawl4AI cannot reliably do direct scraping of Google's AI Overviews. AI responses are dynamically generated with heavy JavaScript, Google has aggressive anti-bot protections, and the layout changes frequently. Anyone promising AI citation monitoring with just a scraper is lying or hasn't tried.

You need a dedicated API. DataForSEO is my choice: it has both classic SERP APIs (with AI Overviews in the results) and a dedicated AI Optimization module that tracks mentions of your domain on ChatGPT and Google AI directly. All integrable in Claude Code via MCP server, so queries launch from the same interface where you write code.

If you don't know DataForSEO MCP, I wrote a complete guide to integration with Claude Code covering setup and use cases.

The pipeline architecture

The system has three components working together.

  1. DataForSEO queries Google for target keywords (live SERP API) and extracts citations from AI Overviews. In parallel, the AI Optimization module tracks your domain mentions on ChatGPT and Google AI with aggregated metrics
  2. Crawl4AI analyzes cited pages (competitors and yours) to understand the structure of content being cited: headings, length, format, structured data
  3. Claude Code orchestrates everything via MCP: launches DataForSEO queries, passes URLs to Crawl4AI, compares citations against the target domain, generates trend reports over time

Integration is native on both sides. DataForSEO has an official MCP server, Crawl4AI works as an MCP server via an open-source Claude Code skill. Claude Code calls both as tools without leaving the workspace.

What it monitors in practice

Data extractedSourceRecommended frequency
AI Overview present for keyword?DataForSEO SERP APIWeekly
Your domain cited in AI Overviews?DataForSEO SERP APIWeekly
Domain mentions on ChatGPT/Google AIDataForSEO AI OptimizationWeekly
Competitor domains citedDataForSEO SERP APIWeekly
Content structure of cited pagesCrawl4AIMonthly
Citation variation over timeClaude Code (aggregation)Weekly trend

Practical demo: test on claudio-novaglio.com

I tested the pipeline on 15 keywords relevant to my site. For each, I ran 5 queries via DataForSEO SERP API over 3 days to account for AI response variability.

Results: where I get cited

Of 15 keywords tested, 11 generate an AI Overview in Italian. My site appears as a cited source in 3 of these 11, with frequency varying between 20% and 60% of repeated queries.

  • High-frequency citations (60%): "screaming frog mcp claude code"—technical article with original data and code
  • Medium-frequency citations (40%): "google analytics mcp claude code"—practical guide with step-by-step configuration
  • Low-frequency citations (20%): "ai agent workflow patterns for seo"—competitive content, many alternative sources

The cited articles are the Screaming Frog MCP for Claude Code, the

Google Analytics 4 MCP for Claude Code, and the

AI agent workflow patterns for SEO.

For the 8 keywords where I don't get cited, the common pattern is that cited content from Google all have: first-hand data (case studies, original tests, benchmarks), answer-first structure in early paragraphs, and recent updates (last 3–6 months).

The patterns that emerge

Analyzing pages cited with Crawl4AI, three recurring structural characteristics appear.

  1. Citable chunks of 100–200 words: the passages Google tends to extract are self-sufficient. They don't require surrounding context to make sense. My pages that get cited already have this structure; those that don't have long, interconnected paragraphs.
  2. Direct answer in first 200 words: cited pages answer the query within the first heading. No generic "why SEO is important" intros. Answer first, context after.
  3. Complete structured data: all highly cited pages have Article schema, FAQPage, and Organization. Consistent with research by BrightEdge: +44% AI citations for pages with updated structured data.

GEO optimization based on data: what I changed

After analyzing results, I modified 4 of my articles, applying the patterns the pipeline identified.

Restructuring into citable chunks

I rewrote long paragraphs, breaking them into 100–200 word blocks, each with a clear statement in the first sentence and supporting data in the rest. The goal: every block should be extractable and insertable into an AI response without losing meaning.

Answer-first in early paragraphs

For informational articles, I moved the direct answer to the query into the first 150 words, before any context. If someone searches "what is GEO," the operational definition needs to be in paragraph one, not paragraph three.

Updating structured data

I verified and updated JSON-LD schema on all target pages, following the guidance in the guide to structured data and Schema.org. In particular, I added current dateModified fields and verified consistency between FAQs in content and FAQPage in schema.

Preliminary results

Two weeks after modifying, I re-ran the pipeline on the same keywords. Two of the 4 modified articles moved from "not cited" to "cited with low frequency" (20–30%). The data isn't conclusive: two weeks is short and AI response variability makes isolating causality hard. But the direction aligns with observed patterns.

Limitations of this approach

It would be dishonest to present this pipeline as a complete solution. Here are the real limits.

  • Cost per query: DataForSEO is pay-per-use: each live SERP call costs about $0.002. 50 keywords repeated 5 times each = 250 calls = about $0.50/week. Cheap, but for a site with 500+ pages the cost scales. Dedicated tools offer unlimited monitoring at flat price.
  • Non-determinism unsolved: 5 queries per keyword beat 1, but aren't statistically robust. Enterprise tools run 50–100 repetitions per prompt. My pipeline returns indicators, not certainty.
  • Google AI Overviews only: the pipeline doesn't monitor ChatGPT, Perplexity, Gemini, or Claude as search engines. Each platform has preferred sources (ChatGPT favors Wikipedia at 47.9%, Perplexity cites Reddit heavily at 46.7%).
  • Not a product: it requires technical skills to configure DataForSEO MCP, Crawl4AI, and Claude Code. Not something you install and run. It's a working prototype for people with the skills to use it.

For those with budget, commercial tools are the pragmatic choice. Otterly AI at $29/month covers more platforms with larger samples. This pipeline makes sense for those who want to understand the mechanism, experiment without fixed costs, or integrate monitoring into an existing Claude Code workflow.

Google AI Mode in Italy: what to know in April 2026

AI Mode has been available in Italy since October 2025. Italian language is supported. Access is built into Google Search for logged-in users 18+ (GDPR/DMA restrictions). Google AI Plus, the premium tier, costs €7.99/month in Italy.

Global numbers in April 2026: 75 million daily active users on AI Mode (confirmed by Google's Nick Fox, December 2025). Over 2 billion monthly users reach AI Overviews. The March 2026 Core Update, rolling out from March 27, appears to reward content with human editorial oversight and penalize AI-generated content without added value.

A critical data point for GEO (Seer Interactive study, September 2025): brands cited in AI Overviews gain 35% more organic clicks and 91% more paid clicks than those not cited. Visibility in AI responses isn't just brand awareness: it generates measurable traffic.

Next steps

  1. Automate the pipeline to run weekly with a cron job and generate week-on-week comparative reports
  2. Add Perplexity monitoring via their API (Perplexity cites sources deterministically, making tracking more reliable than Google)
  3. Build a minimal dashboard to visualize citation trends by keyword and domain over time
  4. Integrate cited content analysis with the article generation pipeline to produce content already optimized for AI citability

To understand the complete GEO strategy before building monitoring, start with the guide to GEO, AI Overviews, and content strategy 2026.

For the AI-response-focused keyword research phase, read GEO keyword research and query fan-out.

If you want to build more complex AI pipelines for SEO, my article on AI agent workflow patterns for SEO covers orchestration patterns.

For advice on adapting your SEO strategy to AI responses, get in touch for a consultation.

Frequently Asked Questions

Google Search Console doesn't track AI Overview citations. Monitor with commercial tools like Otterly AI (from $29/month) or Semrush AI Toolkit ($99/month), or build a DIY pipeline with DataForSEO (SERP API + AI Optimization module) to extract cited sources in AI responses and track mentions on ChatGPT and Google AI. Any method requires repeated queries because AI responses are non-deterministic.

Crawl4AI is an open-source crawler with 63,000+ GitHub stars, designed to produce Markdown output readable by AI models. For GEO, use it to analyze the structure of content cited in AI responses (headings, length, format) and identify patterns that favor citation. Don't use it for direct AI Overview scraping—that requires dedicated APIs like DataForSEO.

Options range from free to premium. HubSpot AI Search Grader offers free initial diagnostics. Paid tools: Otterly AI from $29/month, Semrush AI Toolkit from $99/month, Profound from $499/month for enterprise. A DIY DataForSEO pipeline (pay-per-use, ~$0.50/week for 50 keywords) + open-source Crawl4AI is most affordable but requires technical skills.

Language models are non-deterministic: the same prompt produces different responses each time. The overlap between sources cited in repeated answers to the same query is below 10%. This is why AI citation monitoring requires repeated sampling (5–10 queries per keyword) to find the stable pattern.

Yes, since October 2025. Italian is supported. Access requires a Google account age 18+ (GDPR/DMA restrictions). Google AI Plus, the premium tier with advanced features, costs €7.99/month in Italy. AI Mode is available in over 200 countries.

Frequently cited content has three characteristics: self-sufficient passages of 100–200 words (citable chunks), direct answer to the query in the first 150 words (answer-first format), and complete structured data (Article, FAQPage schema). Brands cited in AI Overviews gain 35% more organic clicks.

About the author

Claudio Novaglio

Claudio Novaglio

SEO Specialist, AI Specialist e Data Analyst con oltre 10 anni di esperienza nel digital marketing. Lavoro con aziende e professionisti a Brescia e in tutta Italia per aumentare la visibilità organica, ottimizzare le campagne pubblicitarie e costruire sistemi di misurazione data-driven. Specializzato in SEO tecnico, local SEO, Google Analytics 4 e integrazione dell'intelligenza artificiale nei processi di marketing.

Want to improve your online results?

Let's talk about your project. The first consultation is free, no commitment.