Googlebot Crawling 2026: 2 MB Limit, WRS & Best Practice

Googlebot cuts off scanning at 2 MB per HTML page, including HTTP headers. Gary Illyes of Google confirmed this on March 31, 2026, in the official blog post "Inside Googlebot" and episode 105 of the Search Off the Record podcast.

The median HTML page without compression is about 20 kilobytes: the vast majority of sites will never approach that limit. But if your HTML contains base64-encoded images, embedded blocks of CSS/JavaScript, or mega-menus with thousands of links, text content and structured data at the bottom of the page might never be read. For Google, they don't exist.

In this article, I analyze the technical data Google published, what it means for technical SEO, and concrete actions you can take today to make sure Googlebot reads all the content that matters.

Googlebot is not a single program

Gary Illyes clarifies a historical misconception. In the early 2000s, Google had one product (search) and one crawler. The name "Googlebot" has stuck, but today it describes only one client of a centralized crawling platform.

When you see Googlebot in your server logs, you're looking at Google Search. But the same infrastructure serves dozens of other services: Google Shopping, AdSense, and many others. Each of these clients uses its own user-agent, its own robots.txt rules, and its own byte limits per URL. The complete list is documented on Google's official crawler page.

On March 20, 2026, Google added another distinction: Google-Agent, a crawler that activates only on user action, unlike autonomous crawlers like Googlebot that discover and index content independently.

The 2 MB limit: what happens to your page's bytes

Each client on the crawling platform sets its own thresholds. For Googlebot (Google Search), the limit is 2 MB per single URL. This includes HTTP response headers.

Crawler	Limit	Notes
Googlebot (Search)	2 MB	Includes HTTP headers. Main limit for SEO
PDF Crawler	64 MB	PDF files only
Image/Video Crawler	Variable	Depends on product (favicon has low limits, Image Search has high)
Any other Google crawler	15 MB	Default for those not specifying a limit

What happens when an HTML page exceeds 2 MB:

Googlebot stops the download at exactly 2 MB. It doesn't reject the page, it truncates it.
The first 2 MB are passed to indexing systems and the Web Rendering Service (WRS) as if they were the complete file.
Bytes beyond the threshold are never downloaded, rendered, or indexed. For Google, they don't exist.
Resources referenced in the HTML code (CSS, JavaScript, excluding media and fonts) are downloaded separately by the WRS, each with its own independent byte counter. They don't count toward the page's 2 MB limit.

The truncation is silent: Google sends no warning. You won't find errors in Search Console, no notifications. The only way to catch it is to measure your HTML weight and verify that critical content falls within the first 2 MB.

How much does a web page weigh in 2026

To put the limit in context, we need real data. According to the HTTP Archive Web Almanac 2025 (the most complete public source on web sizes), the median web page weighs about 2.4 MB total, but that includes all resources: images, JavaScript, CSS, fonts.

Metric	Desktop	Mobile
Total median weight (all resources)	2,412 KB	2,164 KB
Total weight 90th percentile	9,179 KB	8,337 KB
Median image weight	1,054 KB	similar
Median JavaScript weight	613 KB	similar

The key data point is different: the median weight of HTML alone without compression is about 20 kilobytes. At the 90th percentile, we're at 392 KB. Far from 2 MB. In practice, only a small percentage of web pages reach that threshold with HTML alone.

Who risks exceeding it: e-commerce sites with mega-menus containing thousands of category links, pages with base64-encoded images embedded directly in the HTML, single-page applications that include enormous blocks of inline CSS and JavaScript, pages generated by CMSs that insert redundant markup.

The Web Rendering Service: how Google executes JavaScript

After the download, Googlebot passes the bytes to the WRS. Since 2019 (announced at Google I/O, Martin Splitt), the WRS uses an evergreen version of Chromium, updated periodically. "Evergreen" doesn't mean the latest release: there's always a gap of weeks or months between the WRS version and the current stable Chromium. The WRS processes JavaScript, CSS, and executes XHR requests to rebuild the page's final state.

Three characteristics of the WRS are relevant for technical SEO.

The WRS is stateless

Each request starts from scratch: local storage, session storage, and cookies are cleared between pages. If your site depends on session data to show content (for example, a login popup that hides the main text), Googlebot won't see that content.

The WRS doesn't download images and video

The WRS processes HTML, CSS, JavaScript, and XHR requests, but doesn't download media files (images and video). For each textual resource it downloads (JS file, CSS file), the per-URL limit applies separately.

Crawling and rendering operate on separate queues

After the download, pages enter a rendering queue. The WRS doesn't immediately process every downloaded page. For JavaScript-heavy sites, this means content might take days or weeks to be indexed after the first crawl. Crawling and rendering have independent budgets: a site can be crawled frequently but rendered rarely.

What about other engines? The Bing case

Google isn't the only one with size limits. Bing applies a soft limit of 1 MB for HTML code. Unlike Google, Bing reports the issue: if the page exceeds the threshold, Bing Webmaster Tools shows an "HTML size is too long" error. Google truncates silently.

If you optimize for Google's 2 MB limit, you're automatically compliant for Bing too. The practical advice remains the same: lean HTML, heavy resources in external files.

Five concrete actions to stay under the limit

1. Measure your HTML weight

Before optimizing, measure. Open the terminal and download the page:

curl -s -o /dev/null -w "%{size_download}" https://yoursite.com/page

Or use Screaming Frog: the "HTML Size" column shows the weight of uncompressed HTML for each URL. If any page exceeds 1.5 MB, intervene before it becomes a problem.

2. Move CSS and JavaScript to external files

Each external resource has its own separate byte counter. A 500 KB block of inline CSS counts toward the page's 2 MB. The same CSS in an external .css file doesn't. The same logic applies to JavaScript: if you use frameworks like Next.js or Nuxt, code splitting is already active. If you have a custom site with inline scripts, this is the first intervention.

3. Remove base64-encoded images from HTML

Images encoded as base64 directly in your HTML can add hundreds of kilobytes or megabytes to the document. Replace them with img tags pointing to external files: images are downloaded by the image crawler with separate thresholds, not the HTML crawler.

4. Put critical content high in the document

Google processes the first 2 MB as if it were the entire file. If your title tag, canonicals, meta tags, structured data, and main text content are at the bottom of the page, after thousands of lines of menus and sidebars, you risk them falling beyond the threshold. Move meta tags, canonicals, structured data, and primary content as high as possible in the source code.

5. Monitor response times and crawl budget

If your server takes too long to serve bytes, Google's crawlers automatically slow down their scan frequency to avoid overloading your infrastructure. The result: fewer pages scanned per session, less efficient crawl budget. In the technical SEO community, a Time to First Byte under 200 milliseconds is considered the optimal target; Google's Core Web Vitals mark 800 ms as "good" for server response time.

Crawl budget is the amount of resources Google dedicates to scanning your site in a given period. Heavier HTML pages require more time to generate and transmit, reducing the total number of URLs Googlebot can visit. For sites with thousands of pages, keeping HTML lean and eliminating redirect chains and duplicate parameterized URLs has a direct impact on index coverage.

Limitations of this analysis

The limit isn't final.: Gary Illyes explicitly writes that the 2 MB limit "isn't carved in stone" and could change as the web evolves.
The data concerns Googlebot only.: The limits for other crawlers on the same Google platform (Shopping, AdSense) aren't all publicly documented.
We don't know how much traffic is lost exceeding the limit.: Google doesn't publish data on the impact of truncation on rankings and indexing. Spotibo and DebugBear conducted tests confirming the truncation, but not the effect on positioning.
It's unclear if the limit applies to compressed or decompressed bytes.: Gary Illyes talks about "downloaded bytes," but independent tests (Spotibo, DebugBear) suggest truncation occurs on decompressed content. Until Google clarifies, the safety margin is keeping uncompressed HTML under 2 MB.

Next steps

90% of web pages have HTML under 400 KB (HTTP Archive 2025 data): for most sites, the 2 MB limit isn't a problem. Sites at risk are e-commerce with mega-menus, pages with dynamically generated content, and single-page apps with massive inline JavaScript. The first step is measuring HTML weight and intervening where needed.

For a complete analysis of your site's technical SEO, you can start with the technical SEO audit guide in 7 steps. If you want to understand how to automate crawling with AI tools,

read how I integrated Screaming Frog with Claude Code for technical SEO audit.

To dive deeper into structured data that Google recommends placing high in the document, the practical guide to Schema.org and structured data covers all relevant markup types.

Have questions about your site's technical SEO? Write to me from the contact page.

Frequently Asked Questions

Yes. Googlebot stops the download at 2 MB for each single HTML URL, including HTTP headers. Bytes beyond that threshold are never downloaded, rendered, or indexed. External resources (CSS, JS) have separate counters.

Gary Illyes's wording refers to bytes downloaded from the network, but empirical tests from Spotibo and DebugBear suggest truncation occurs on decompressed content. The issue remains ambiguous in official documentation. For safety, keep uncompressed HTML under 2 MB.

You can use curl to measure download weight, or Screaming Frog which shows HTML size for each crawled URL. If you use Chrome DevTools, the Size column in the Network tab shows the HTML document weight.

Indirectly, yes. Crawl budget is the number of URLs Googlebot visits on your site in a given period. Larger HTML pages require more generation and transmission time. If your server responds slowly, Google reduces scan frequency.

No. The 15 MB limit as default was already documented since 2022. The clarification that Googlebot (Search) uses 2 MB came with the documentation update in February 2026, then explained in the blog post of March 31, 2026. John Mueller confirmed the behavior existed before.

The WRS uses Chromium (evergreen version, updated periodically since 2019) to execute JavaScript and CSS, process XHR requests, and rebuild the page's final content. It operates stateless: it clears cookies, local storage, and session data between requests. It doesn't download images or video.

About the author

Claudio Novaglio

SEO Specialist, AI Specialist e Data Analyst con oltre 10 anni di esperienza nel digital marketing. Lavoro con aziende e professionisti a Brescia e in tutta Italia per aumentare la visibilità organica, ottimizzare le campagne pubblicitarie e costruire sistemi di misurazione data-driven. Specializzato in SEO tecnico, local SEO, Google Analytics 4 e integrazione dell'intelligenza artificiale nei processi di marketing.

How Googlebot Crawling Works in 2026: 2 MB Limits, Bytes, and Web Rendering Service