News outlets lose copyright lawsuit against OpenAI

8 Nov 2024

Image: © Arnav Pratap Singh/Stock.adobe.com

The judge ruled in favour of OpenAI stating that the outlets were unable to prove ‘concrete injury’.

In a win for the ChatGPT maker, the Southern District of New York has dismissed a lawsuit taken by two news media outlets against OpenAI for allegedly violating copyright law by scraping news article to train its AI models.

News outlets Raw Story Media and AlterNet Media filed the complaint earlier this year, accusing OpenAI of violating the Digital Millennium Copyright Act (DMCA) by scraping copyrighted journalistic work to train its AI models.

The plaintiffs said that OpenAI “intentionally” removed copyright management information – data that includes the work’s title, the author name and terms and conditions of use of the copyrighted work.

Under DMCA provisions, it is prohibited to remove or alter copyright management information with knowledge that it would facilitate copyright infringement.

OpenAI does not reveal the exact data it uses to train its models. However, the plaintiffs alleged that in an “extensive review” of publicly available information, they found “thousands” of their copyrighted works were included in OpenAI’s data sets without details on the author, title and copyright information – details that the plaintiffs made available with their work.

The two plaintiffs, in their lawsuit, sought $2,500 in damages per violation.

However, yesterday (7 November) Judge Colleen McMohan dismissed the lawsuit and stated that the plaintiffs were unable to prove any “concrete injury”.

“When a user inputs a question into ChatGPT, ChatGPT synthesises the relevant information in its repository into an answer,” the judge wrote. She claimed that the likelihood that ChatGPT, an AI model trained on large swaths of data, would output plagiarised content from one of the plaintiff’s article is “remote”.

In a similar lawsuit filed last year, The New York Times launched a legal battle against OpenAI for alleged copyright infringement, claiming that ChatGPT is trained on millions of articles published by the outlet. Business Insider reported last month that the Times’ lawyers were poring over ChatGPT’s source code in a secure room to try to figure out how AI trains on creative work.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Suhasini Srinivasaragavan is a sci-tech reporter for Silicon Republic

editorial@siliconrepublic.com