HarperCollins reportedly pens deal with Microsoft to train AI on its books

The publishing giant said there will be clear guardrails to respect authors’ rights.

Publishing giant HarperCollins has agreed to allow a technology company to use “select nonfiction” books to train its artificial intelligence (AI) models.

The company told 404 Media (18 November) that it made a deal with an unnamed “technology company” and that it will allow authors to opt in for the new venture.

Bloomberg reported yesterday (19 November) that Microsoft is the tech company that will team up with HarperCollins and use its nonfiction books to train a new AI model. Exact details about this AI model are currently unknown.

“HarperCollins has a long history of innovation and experimentation with new business models,” the company said in a statement.

“Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams. This agreement, with its limited scope and clear guardrails around model output that respects authors’ rights, does that.”

Last week, writer Daniel Kibblesmith shared an email he received asking if he’d consent to include his novel Santa’s Husband in the training bundle. According to screenshots posted by Kibblesmith, the deal was worth $2,500 for each title for a three-year licensing agreement, and would include “certain protections concerning credit and limits of verbatim usage per AI response”. Kibblesmith refused the deal, calling it “abominable”.

In a response to his original post, he said: “Direct any outrage toward the incredibly doable action of purchasing physical books by living authors from local bookstores.”

In May of this year, News Corp, the parent company of HarperCollins, struck a deal with OpenAI to allow the ChatGPT creator train its AI models on the company’s news content. The deal also allows OpenAI to display news content from several publications owned by News Corp, including The Wall Street Journal and The Sunday Times, in response to questions asked by users of its AI models.

While other news organisation have also struck deals with OpenAI, including The Atlantic and Vox Media, some news organisations and publishers have not been so welcoming of AI disruption. The New York Times is suing the AI giant for allegedly copying and using millions of copyrighted news articles, in-depth investigations and other journalistic work “without permission or payment”.

In October, The Guardian reported that UK ministers are facing a backlash over plans to allow AI companies to train their models on content from publishers and artists by default unless they opt out. Earlier that month, thousands of creatives around the world signed a statement warning AI companies that the unlicensed use of their work to train generative AI models is a “major, unjust threat” to their livelihoods.

SiliconRepublic.com has reached out to HarperCollins for comment.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.