The Denver Post sues OpenAI and Microsoft, alleging tech giants illegally harvested copyrighted articles

The Denver Post and seven other newspapers sued Microsoft and OpenAI on Tuesday, claiming the technology giants illegally harvested millions of copyrighted articles to create their cutting-edge “generative” artificial intelligence products including OpenAI’s ChatGPT and Microsoft’s Copilot.

While the newspapers’ publishers have spent billions of dollars to send “real people to real places to report on real events in the real world,” the two tech firms are “purloining” the papers’ reporting without compensation “to create products that provide news and information plagiarized and stolen,” according to the lawsuit in federal court.

“We can’t allow OpenAI and Microsoft to expand the Big Tech playbook of stealing our work to build their own businesses at our expense,” said Frank Pine, executive editor of MediaNews Group and Tribune Publishing, which own seven of the newspapers. “The misappropriation of news content by OpenAI and Microsoft undermines the business model for news. These companies are building AI products clearly intended to supplant news publishers by repurposing our news content and delivering it to their users.”

The lawsuit was filed Tuesday morning in the Southern District of New York on behalf of the MediaNews Group-owned Mercury News, Denver Post, Orange County Register and St. Paul Pioneer-Press; Tribune Publishing’s Chicago Tribune, Orlando Sentinel and South Florida Sun Sentinel; and the New York Daily News.

Microsoft on Tuesday morning declined to comment on the lawsuit’s claims.

OpenAI said Tuesday morning it takes “great care” in its products and design process to support news companies. “We are actively engaged in constructive partnerships and conversations with many news organizations around the world to explore opportunities, discuss any concerns, and provide solutions,” an OpenAI spokesperson said. “We see immense potential for AI tools like ChatGPT to deepen publishers’ relationships with readers and enhance the news experience.”

Microsoft’s deployment of its Copilot chatbot has helped the Redmond, Wash., company boost its value in the stock market by $1 trillion in the past year, and San Francisco’s OpenAI has soared to a value of more than $90 billion, according to the lawsuit.

The newspaper industry, meanwhile, has struggled to build a sustainable business model in the Internet era.

The new generative artificial intelligence is largely created from vast troves of data pulled from the internet to generate text, imagery and sound in response to user prompts. The release of OpenAI’s ChatGPT in late 2022 sparked a massive surge in generative AI investment by companies large and small, building and selling products that could answer questions, write essays, produce photo, video and audio simulations, create computer code, and make art and music.

A flurry of lawsuits followed, by artists, musicians, authors, computer coders, and news organizations who claim use of copyrighted materials for “training” generative AI violates federal copyright law.

Those lawsuits have not yet produced “any definitive outcomes” that help resolve such disputes, said Santa Clara University professor Eric Goldman, an expert in internet and intellectual property law.

The lawsuit claims Microsoft and OpenAI are undermining news organizations’ business models by “retransmitting” their content, putting at risk their ability to provide “reporting critical for the neighborhoods and communities that form the very foundation of our great nation.”

Among the issues raised in the lawsuit is an incident in which ChatGPT responded to a prompt about smoking and asthma. In its response, the A.I. chatbot incorrectly said The Denver Post had published research saying smoking could be a cure for asthma.

“These ‘hallucinations’ mislead users as to the source of the information they are obtaining, leading them to incorrectly believe that the information provided has been vetted and published by the Publishers,” the suit says.

Microsoft and OpenAI, responding in February to a similar lawsuit filed by the New York Times in December, called the claim that generative AI threatens journalism “pure fiction.” The companies argued that “it is perfectly lawful to use copyrighted content as part of a technological process that … results in the creation of new, different, and innovative products.”

Pine said Microsoft and OpenAI are stealing content from news publishers to build their products.

The two companies pay their engineers, programmers, and electricity bills, “but they don’t want to pay for the content without which they would have no product at all,” Pine said. “That’s not fair use, and it’s not fair. It needs to stop.”

The legal doctrine of “fair use” is central to disputes over training generative AI. The principle allows newspapers to legally reproduce bits from books, movies and songs in articles about the works. Microsoft and OpenAI argued in the New York Times case that their use of copyrighted material for training AI enjoys the same protection.

Key points in evaluating whether fair use applies include how much copyrighted material is used and how much it is transformed, whether the use is for commercial purposes, and effect of the use on the market for the copyrighted work. Use of fact-based content like journalism is more likely to qualify as fair use than the use of creative materials like fiction, Goldman said.

Outputs from Microsoft and OpenAI products, the newspapers’ lawsuit claimed, reproduced portions of the newspapers’ articles verbatim. Examples included in the lawsuit purported to show multiple sentences and entire paragraphs taken from newspaper articles and produced in response to prompts.

Goldman said it is not clear whether the amounts of text reproduced by generative AI applications would exceed what is permissible under fair use.

Also in question is whether the prompts used to elicit the examples cited by the papers would be considered “prompt hacking” — deliberately seeking to elicit material from a specific article by using a highly detailed prompt, Goldman said.

Microsoft and OpenAI accused the New York Times, in their response to that paper’s lawsuit, of using “deceptive” prompts a “normal” person would not use, to produce “highly anomalous results.”

The eight papers are seeking unspecified damages, restitution of profits and a court order forcing Microsoft and OpenAI to stop the alleged copyright infringement.

Get more Colorado news by signing up for our daily Your Morning Dozen email newsletter.