Nieman Lab first reported that some publishers and news organizations have begun blocking the Internet Archive’s Wayback Machine from preserving and providing access to archived versions of their websites.
Since then, journalists, digital rights advocates, historians, librarians, and researchers have raised alarms about the long-term consequences of limiting web preservation.
What is the Wayback Machine?
The Internet Archive is a nonprofit research library with a mission of providing Universal Access to All Knowledge. The Wayback Machine is a service of the Internet Archive that allows people to visit archived versions of web pages. The Wayback Machine has been archiving the web since 1996, helping preserve the historical record of the internet. Learn more about the Wayback Machine.
Why are publishers blocking the Wayback Machine?
As reported by journalists Andrew Deck and Hanaa’ Tameez in Nieman Lab, publishers say they are concerned about AI scraping and unauthorized reuse of their content by generative AI companies.
Some organizations have responded by broadly blocking web archiving systems like the Wayback Machine out of concern that archived material could be accessed or reused by AI systems.
But those concerns are unfounded, explains journalist Andrew Deck in an interview with Marketplace Tech:
“I think it’s important to say that in our conversations with news publishers, a lot of them were taking this action preemptively out of a fear of proxy scraping rather than direct evidence that it has happened to them already. None of the publishers were able to point to a particular AI company or other kinds of direct evidence that their content had already been scraped by the Wayback Machine.”
Michael Nelson, a computer scientist at Old Dominion University, has described the Wayback Machine and web archiving as “collateral damage” as content owners restrict access over AI scraping concerns.
That sentiment was echoed by Mark Graham, director of the Wayback Machine, on the Future Knowledge podcast episode, “Preserving the Web in the Age of AI”:
“The Wayback Machine is collateral damage caught up in the conflict between AI companies and publishers.”
Who uses the Wayback Machine?
The Wayback Machine is widely used by journalists, fact-checkers, researchers, lawyers, courts, librarians, academics, and even publishers themselves.
More than 100 news articles every month reference, cite, or rely on material preserved by the Wayback Machine to verify claims, recover deleted information, or provide historical context.
In a Future Knowledge podcast interview, Mark Graham recalled a conversation at The New York Times:
“A senior researcher came up to me and said, ‘Oh my God, Mark, thank you so much for the Wayback Machine. We use you all the time. There is material available that we’ve used from the Wayback Machine that we can’t even find in our own archives.’”
Journalist Rachel Maddow also publicly defended the archive:
“The Internet Archive is a national treasure. I use it daily, and have for many, many years. I cannot imagine doing the work I do without it.”
What are critics saying about the blocking?
A broad coalition of journalists, digital rights advocates, and internet historians has warned that blocking the Wayback Machine—and web archiving tools like it—could have serious long-term consequences.
In Techdirt, Mike Masnick argued that publishers may regret these blocking decisions because they undermine preservation of the public record.
Joe Mullin, senior policy analyst at the Electronic Frontier Foundation, similarly warned that blocking the Internet Archive “won’t stop AI” but could erase important historical records from the web.
Meanwhile, coverage by journalist Kate Knibbs in Wired brought broader public attention to the issue and the risks facing digital preservation infrastructure.
Have journalists spoken out in support of the Wayback Machine?
Yes.
More than 200 journalists signed a public statement applauding the Internet Archive’s role in preserving the public record.
In response, Mark Graham published a public thank-you letter:
“Your support for the Wayback Machine sends a clear message: preserving the record matters.”
Does the Internet Archive respect publisher concerns?
Yes.
The Internet Archive works with publishers and rights holders to balance preservation, access, and responsible stewardship of digital materials.
What about AI scraping?
As Mark Graham explained in Techdirt:
“The Wayback Machine is built for human readers. We use rate limiting, filtering, and monitoring to prevent abusive access, and we watch for and actively respond to new scraping patterns as they emerge.”
What’s at stake when publishers block the Wayback Machine?
Every day that preservation systems are blocked leaves holes in the public record of the web. If preservation systems are weakened or blocked at scale, future generations will lose access to major parts of our digital history.
The size of the problem is significant. A 2024 study by Pew Research Center found that 38% of webpages from 2013 were no longer accessible a decade later, with roughly a quarter of pages sampled across the decade disappearing entirely. But loss on the web is not inevitable. New analysis by Internet Archive data scientist Sawood Alam found that the Wayback Machine has preserved roughly 15% of those otherwise vanished pages, saving reporting, citations, and pieces of the historical record that would no longer exist online.
As Mike Masnick wrote in Techdirt: “Blocking the Internet Archive isn’t going to stop AI training. What it will do is ensure that significant chunks of our journalistic record and historical cultural context simply… disappear.”
And as Mark Graham wrote:
“Preserving the public record is not optional. It is essential infrastructure for a functioning democracy.”
UPDATED: May 19, 2026 CDF