News outlets are limiting the Internet Archive’s access to their journalism

TL;DR

Many major U.S. local news publishers are blocking the Internet Archive’s crawling bots, limiting access to their journalism. This move impacts researchers, journalists, and historians relying on web archives for primary sources. The full extent and motivations remain under discussion.

Over 340 local news websites across the United States have begun restricting the Internet Archive’s web crawling bots, according to recent analysis. This development, driven by concerns over data scraping and intellectual property, threatens the long-term preservation of local journalism and impacts researchers, journalists, and historians relying on web archives.

Since January 2026, the number of local news sites disallowing Internet Archive bots has increased from 241 to 382, with the majority owned by major publishers such as USA Today Co., McClatchy, Advance Local, MediaNews Group, and Tribune Publishing. Many of these sites are blocking specific bots associated with the Internet Archive, including Heritrix and related user agents.

Researchers and journalists emphasize that web archives like the Internet Archive’s Wayback Machine are vital for preserving primary sources of local news, especially as many outlets face financial pressures and decline. Notably, local outlets owned by large corporations are among the most active in restricting access, raising concerns about the future availability of local news history.

The Internet Archive has responded by stating it is engaging in conversations with publishers and has implemented measures to prevent abuse, such as limiting bulk downloads and monitoring bot activity. News outlets are limiting the Internet Archive’s access to their journalism. Mark Graham, founder of the Wayback Machine, confirmed ongoing discussions but emphasized that their terms of use restrict collections to research and scholarship purposes.

Why It Matters

This restriction on web archiving poses a significant threat to the preservation of local journalism, which is a critical component of the historical record. Without access to these archives, future researchers, journalists, and citizens may find it difficult to verify past events, track media coverage, or understand local histories. The move also raises broader questions about intellectual property rights, data privacy, and the role of nonprofit archives in maintaining an open and accessible internet.

Amazon

web archiving tools for researchers

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Since early 2026, news outlets have expressed concerns over AI companies scraping their content for training purposes, prompting some publishers to restrict web crawling. The Internet Archive, a nonprofit organization, has historically preserved vast amounts of online news content, including local journalism, which is increasingly under threat as publishers tighten access controls. Previous debates have centered on copyright and fair use, but recent actions appear driven by fears of data extraction for AI training.

“Blocking the Internet Archive’s web crawlers threatens one of the most effective ways that we capture and store news content for the long term.”

— Edward McCain, journalism librarian at the University of Missouri

“We are in conversation with many publishers and appreciate the opportunity to address their concerns.”

— Mark Graham, founder of the Wayback Machine

“This is the same fight that everybody has been having with the Internet Archive since its inception.”

— Meredith Broussard, data journalist and NYU professor

“Without the Internet Archive, my work would be incredibly difficult to do.”

— B.J. Mendelson, journalist and petition signer

Amazon

digital news archive software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear whether AI companies have already scraped content from the restricted sites or plan to do so in the future. The full scope of publishers’ motivations and the potential legal or technological responses by the Internet Archive are still evolving. Additionally, the long-term effectiveness of current measures to prevent abuse is uncertain.

Amazon

internet archive preservation tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The Internet Archive continues discussions with publishers to address concerns and explore technical solutions. Monitoring of site restrictions will persist, and advocacy efforts by journalists and researchers are likely to increase. Future developments may include legal or policy debates over fair use, copyright, and digital preservation rights.

Amazon

web crawler for news websites

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why are news outlets blocking the Internet Archive?

Many outlets cite concerns over data scraping, intellectual property, and AI training purposes as reasons for restricting web crawling by the Internet Archive’s bots.

Could this affect the availability of local news history?

Yes, limiting access to web archives threatens the preservation of local journalism, which is vital for historical record-keeping and research.

Is the Internet Archive scraping content without permission?

The Internet Archive states it operates within legal and ethical boundaries, using bots designed to respect publisher restrictions and terms of use.

What can journalists and researchers do about this?

They can advocate for open access, participate in petitions, and support policies that balance copyright with the public interest in preserving digital history.

What are the next steps for the Internet Archive?

The organization is engaging in ongoing discussions with publishers, implementing technical safeguards, and monitoring the impact of restrictions to adapt its strategies.

Source: Hacker News

You May Also Like

Marcos orders 10% cut to Philippine government expenses to ease crisis

Philippine President Ferdinand Marcos Jr. has directed a 10% reduction in government spending to address economic challenges amid global tensions.

Orion’s Rally May Only Be In The Early Innings

Analysts suggest Orion’s recent stock surge may be in early stages, raising questions about its future trajectory amid market volatility.

LegalZoom Promo Code: Exclusive 10% Off LLC Formations

LegalZoom is providing an exclusive 10% off promo code for LLC formations, making it easier and more affordable to start a small business online.

OpenAI to confidentially file for IPO as soon as Friday

OpenAI is set to confidentially file for an IPO as early as this Friday, potentially marking one of the largest public market debuts in history, CNBC reports.