Microsoft, IndexNow, and Internet Archive: What Changes for SEO in 2026

On April 17, 2026 Microsoft's Fabrice Canel announced that Internet Archive adopted IndexNow to make fresh content backup easier. The news went unnoticed compared to other April SEO updates, but it touches two of the web's largest crawling systems: Microsoft's search engine and the world's largest digital archive, holding over one trillion pages.
In this article I analyze what IndexNow is, what problem it solves for Internet Archive after a very difficult 2024-2025, and what actually changes in practice for people publishing content on the web today. Spoiler: it's not revolutionary, but it's a useful signal about where active URL notification is going as an alternative to pull crawling.
IndexNow briefly: from pull crawling to push notification
IndexNow is an open protocol launched in October 2021 by Microsoft Bing and Yandex. The idea is simple: instead of waiting for a search engine to discover a new or modified page through crawling, the site sends a POST notification to a standard endpoint and signals the URL. The engine decides whether and when to crawl it, but at least it has the data that it exists.
The protocol is supported today by Bing, Yandex, Naver (South Korea), Seznam (Czech Republic), and Yep. Google tested IndexNow in November 2021 but as of April 2026 hasn't adopted it and doesn't seem inclined to. Search Console submission and traditional sitemaps remain the routes for Google.
The most recent official Microsoft numbers are from the December 2024 Bing post "Look How Far We've Come". According to Microsoft itself, 3.5 billion URLs are notified daily via IndexNow and 18% of URLs clicked in Bing results come from content discovered through this protocol. The numbers are self-reported by the vendor, worth reading with caution. Cloudflare, which integrated IndexNow in November 2021 with Crawler Hints, has processed over 600 billion signals since launch and activated the feature on about 60,000 sites.
| Engine | IndexNow support | Adoption date | Notes |
|---|---|---|---|
| Bing | Yes (initiator) | October 2021 | Native integration with Bing Webmaster Tools |
| Yandex | Yes (initiator) | October 2021 | Co-founder of protocol |
| Naver | Yes | 2022 | South Korea market |
| Seznam | Yes | 2022 | Czech Republic market |
| Yep | Yes | 2023 | Privacy-focused engine |
| No | Never adopted | Briefly tested in 2021, then abandoned |
For context on how Google crawling works and why crawl budget problems differ from Bing's, I analyzed Googlebot's technical limits and Web Rendering Service in the guide on how Googlebot crawling works.
Internet Archive in 2026: one trillion pages and an open crisis
Internet Archive is the nonprofit digital library founded in 1996 by Brewster Kahle. Its best-known service is the Wayback Machine, launched in 2001, which archives historical versions of web pages. On October 22, 2025 the organization announced reaching one trillion pages archived, a milestone the official blog describes as "civilization-scale".
Behind that milestone, the 2024-2026 biennium has been the most critical in the organization's history. In October 2024 a breach exposed data of 31 million users (root cause: GitLab token not rotated for about two years), followed by a DDoS attack from the BlackMeta group and a second Zendesk incident. In parallel, the Hachette v. Internet Archive lawsuit had closed in March 2023 with judgment unfavorable to the Controlled Digital Lending program, affirmed on appeal in September 2024.
But the problem weighing most on archiving work is another. According to a Nieman Lab report from January 2026, 241 news outlets in 9 countries have started blocking "archive.org_bot" in their robots.txt for fear their content will be used as a dataset for AI model training. Between May and October 2025 captures from news sites fell 87%. Guardian and New York Times are among publishers who've closed access.
What IndexNow solves for the Wayback Machine and why it matters for SEO
Internet Archive's archiving flow has always worked in pull mode: the organization's crawlers start from a site list, download pages, follow links, return periodically to detect changes. This approach struggles with high-volatility content like news, job postings, products, and e-commerce, where the difference between publication and capture can be days. On the SEO front the Wayback Machine counts less than Bing or Google, but it counts: archive.org links appear regularly in Google SERPs when the original returns 404, function as brand memory, and preserve historical signals for anyone doing backlink analysis.
IndexNow shifts the load from crawler to site. A publisher wanting to be archived fresh can notify Internet Archive when publishing or updating a URL, exactly like they already do with Bing. For the Wayback Machine this means three concrete things:
- Archive freshness: important pages are archived within minutes of notification, not after the next crawl cycle. For content with short lifespan the difference is substantial.
- Less server pressure on publishers: a pull crawler must periodically revisit URLs that haven't changed. A push crawler is only "called" when something's new.
- Rational crawl budget use: Internet Archive has limited resources and indexes less than Google or Bing. Concentrating crawling on notified URLs improves total coverage at the same infrastructure level.
This isn't entirely new relationship. The December 2024 Bing post already listed Internet Archive among IndexNow adopters alongside GoDaddy and Condé Nast. What Canel made explicit in April 2026 is that Microsoft assisted Internet Archive in protocol adoption, not that integration was built from scratch today.
What changes for people publishing content on the web
For content publishers the impact of this adoption plays out in three concrete scenarios, plus one where nothing changes. The reference number is still December 2024's Bing post: 3.5 billion URLs notified daily via IndexNow is the largest active notification channel in existence, and Internet Archive is part of it.
Scenario 1: you want better archiving
For news, editorial blogs, volatile e-commerce, institutional and cultural project sites, the Wayback Machine has dual value. On one hand it's backup memory: if the site goes offline, a public copy remains consultable. On the other hand archive.org links appear regularly in Google SERPs when the original returns 404, so they contribute to brand presence even after a page ends.
Implementing IndexNow doesn't increase archiving chances by itself (Internet Archive isn't obligated to respect notifications), but aligns the site with the channel Microsoft is promoting as standard for active notification. Cost is low: a verification file in root and an HTTP call on each publication.
Scenario 2: you don't want to be archived
If you manage a site with sensitive content or simply prefer not to be archived, IndexNow changes nothing in substance. But a technical clarification helps: the notification protocol itself is agnostic to robots.txt; it's Internet Archive that declares respecting it in its crawling. So a well-written robots.txt rule remains the correct path to exclude the Wayback Machine. The most used directives from publishers in 2025-2026 include both historical user-agents:
User-agent: ia_archiver
User-agent: archive.org_bot
Disallow: /
This is the path chosen by hundreds of news outlets between 2025 and 2026. No IndexNow implementation circumvents a robots.txt rule on Internet Archive's side.
Scenario 3: you already use IndexNow for Bing
If you notify Bing via IndexNow, nothing changes technically on your end. The protocol is a standard format: your site sends the same POST to any endpoint adhering to IndexNow standard. The new benefit is indirect: your pages might appear more rapidly in the Wayback Machine too, so anyone searching for historical versions of your site on Google finds fresher snapshots.
Announcement limitations and what we don't know
Canel's announcement is important as a signal but leaves several points open worth flagging, because readers of just the headlines risk overestimating impact.
- The original post was published on X, not on an official Microsoft or Internet Archive blog. As of this article's writing, no dedicated post appears on the Bing Webmaster Blog or archive.org blog. The practical source remains Barry Schwartz's reread on Search Engine Roundtable.
- It's unclear when Internet Archive actually started receiving IndexNow notifications. The December 2024 Bing post already listed it among adopters, so the collaboration might have simply found public communication now.
- IndexNow notifies, doesn't force. The protocol doesn't guarantee archiving, exactly like it doesn't guarantee indexing on Bing. Anyone implementing it expecting instant capture for every page will find results divergent from promises circulating on marketing blogs.
- Google stays out. 90% of search market in Italy goes through Google, which continues not supporting the protocol. IndexNow, today, is a Bing-centric tool with collateral benefits on Yandex, Naver, and now Internet Archive. It's not a sitemap substitute.
- The announcement doesn't address Internet Archive's structural problems. Publisher robots.txt blocks, Hachette judgment consequences, and trust recovery after the 2024 breach are issues no URL notification protocol can tackle.
How to implement IndexNow on your site in three steps
IndexNow implementation requires three technical steps and, on most CMSs, less than an hour of work. The procedure is documented on indexnow.org, but details mattering for those integrating it in production are these.
- Generate a key and publish it. The key is a string of 8 to 128 hexadecimal characters. You can generate it with a tool on indexnow.org or use a random hash. Then create a text file with the key as name (e.g., a1b2c3d4e5f6.txt) and the key itself as content, accessible via HTTPS in the domain root. If the root isn't in your control (shared hosting, constrained paths, CDN), you can indicate an alternate path in the keyLocation field of the payload and serve the file from there.
- Send the notification via HTTP POST to api.indexnow.org/indexnow. The payload is JSON with host, key, and URL list fields. The limit per single request is 10,000 URLs and all must belong to the declared host. Relevant responses: 200 for success, 202 for pending, 400 for malformed payload, 403 for invalid key, 429 for rate limit exceeded. Minimal payload: { "host": "yoursite.com", "key": "a1b2c3d4e5f6", "urlList": [ "https://yoursite.com/new-page" ] }.
- Integrate the call into the publishing flow. WordPress has an official IndexNow plugin. Wix has native integration, no setup required. On custom sites just add a hook to the CMS save or a cron reading the sitemap and notifying URLs with recent lastmod.
One operational detail: if you publish batch content, batch URLs into single POSTs and keep a prudential limit of a few dozen requests per minute, even though Microsoft doesn't publish an official rate limit. 429 responses are the only practical indication of when you're going too fast.
Once the flow is active, verify it has real effect. On Bing check the URL Submission report in Bing Webmaster Tools, listing received notifications and their status. On server access logs, filter for Bingbot right after each notification: if the time between POST and crawler passage shortens versus the past, IndexNow is working. One last point worth remembering: IndexNow speeds discovery, doesn't improve content quality. A weak page notified stays weak, just crawled sooner.
If you manage multiple sites or want to insert IndexNow into an automated publishing pipeline, the Google Search Console MCP server can help monitor the effect on indexing. I discussed it in the guide on Google Search Console MCP for Claude Code.
Internet Archive's IndexNow adoption is not revolutionary. It's a piece confirming a direction: web discovery infrastructure is moving toward a hybrid model where active notification supplements traditional crawling. For those doing technical SEO it's useful to know the standard exists, that Google stays out, and that Wayback Machine presence has value beyond Bing.
To better understand how Bing and non-Google engines are becoming more relevant thanks to ChatGPT Search and AI answer engines, read my analysis on ChatGPT Search, Bing, and brand visibility in AI search.
If you want to evaluate together whether IndexNow makes sense for your site or if you're making crawling and indexing choices deserving review, write me from the contact page.
Frequently Asked Questions
IndexNow is an open protocol launched in 2021 by Microsoft Bing and Yandex. It lets a site notify search engines of new or modified URLs via simple HTTP POST. The engine decides whether and when to crawl, but receives the signal in real time instead of discovering it alone.
No. Google tested IndexNow in 2021 but never adopted it. For Google Search, traditional channels remain: XML sitemap and Search Console submission. IndexNow is currently supported by Bing, Yandex, Naver, Seznam, Yep, and as of April 2026, explicitly by Internet Archive.
WordPress has an official IndexNow plugin managed by the Bing team. Install it, generate a key, publish it as a verification file in root, and notify URLs automatically on each publish or update. Wix has the feature already active without configuration.
They're complementary, not exclusive. The sitemap is inventory consulted periodically; IndexNow is active notification reaching the engine in real time. For volatile content (news, e-commerce, job postings) IndexNow reduces latency between publish and discovery. For static sites the benefit is marginal.
Yes. In robots.txt add Disallow rules for Internet Archive's historical user-agents: ia_archiver and archive.org_bot. IndexNow itself is agnostic to robots.txt; Internet Archive declares respecting it. Hundreds of news outlets chose this path in 2025-2026 over AI scraping fears.
The documented limit is 10,000 URLs per single POST. Microsoft doesn't publish official global rate limit, but HTTP 429 responses indicate throttling. Prudently batch and distribute requests over time instead of bombarding the endpoint.
About the author
Claudio Novaglio
SEO Specialist, AI Specialist e Data Analyst con oltre 10 anni di esperienza nel digital marketing. Lavoro con aziende e professionisti a Brescia e in tutta Italia per aumentare la visibilità organica, ottimizzare le campagne pubblicitarie e costruire sistemi di misurazione data-driven. Specializzato in SEO tecnico, local SEO, Google Analytics 4 e integrazione dell'intelligenza artificiale nei processi di marketing.
Want to improve your online results?
Let's talk about your project. The first consultation is free, no commitment.