OpenAI sued by another group of writers, including Pulitzer Prize winner, for copyright infringement

Just days after OpenAI denied accusations of copyright infringement in response to two class-action lawsuits by book authors, the company faces yet another lawsuit from a different group of writers. This new group, including Pulitzer Prize winner Michael Chabon, David Henry Hwang, Rachel Louise Snyder, and Ayelet Waldman, like the previous complainants, alleges that the AI lab misappropriated their writing to train its large language models. The lawsuit (PDF) was filed on Friday in federal court in San Francisco and was first reported by Reuters.

The plaintiffs are seeking class-action status for the lawsuit and allege that OpenAI has used their copyrighted works without their consent for training its models, “OpenAI builds the dataset it uses to train its GPT models by scraping the internet for text data. While casting a wide net across the internet to capture the most comprehensive set of content available allows OpenAI to better train its GPT models, this practice necessarily leads OpenAI to capture, download, and copy copyrighted written works, plays and articles. Among the content OpenAI has scraped from the internet to construct its training datasets are Plaintiffs’ copyrighted works,” notes the suit.

It also argues that the company’s acts of copyright infringement have been “intentional, willful, and in callous disregard of Plaintiffs’ and Class members’ rights. OpenAI knew at all relevant times that the datasets it used to train its GPT models contained copyrighted materials, and that its acts were in violation of the terms of use of the materials.”

The lawsuit further alleges that the AI firm engaged in infringing acts for its own commercial benefits.

Michael Chabon, the Pulitzer Prize winner in 2001 for his book “The Amazing Adventures of Kavalier & Clay,” has previously also signed an open letter along with 10,000 other authors, earlier this year to calls on industry leaders in the AI sector, including CEOs from top technology companies like OpenAI, Alphabet, Meta, Stability AI, IBM, and Microsoft, to address the inherent injustice of developing profitable generative AI technologies using copyrighted works. The letter also demanded that AI developers seek consent from, give credit to, and fairly compensate authors.

In its previous response to a pair of class-action lawsuits, OpenAI argued that the authors had not taken into account the limitations and exceptions to copyright, such as fair use, which enable innovations like large language models. Whether this argument garners a favorable response from the legal system is yet to be determined.

Several prominent news publishers are contemplating legal action against the AI company, which could potentially lead to demands for billions of dollars in royalties.

The New York Times, although not part of this group of publishers considering legal action, is also deliberating its own legal steps. It could potentially involve ordering the destruction of the dataset containing infringing content and mandating its recreation using authorized content for usage. Additionally, the court may impose fines of up to $150,000 for each willfully committed infringement by OpenAI.