OpenAI rejects book authors’ copyright accusations

In a significant legal development, OpenAI has issued a response to a pair of class-action lawsuits filed by prominent book authors, including Sarah Silverman, Paul Tremblay, Mona Awad, Chris Golden, and Richard Kadrey. These authors alleged that OpenAI’s ChatGPT was unlawfully trained using pirated copies of their literary works. In their own words, they filed the lawsuits to challenge “ChatGPT and Meta’s LLaMA, industrial-strength plagiarists that violate the rights of book authors. Because AI needs to be fair & ethical for everyone.”

OpenAI’s response (PDF here), filed in both lawsuits, primarily seeks to dismiss most of the claims put forward by the authors, leaving just one claim of direct copyright infringement for future consideration by the court.

One of OpenAI’s central arguments revolves around what it perceives as a misunderstanding of copyright scope by the authors. The company contends that the authors failed to take into account limitations and exceptions to copyright, such as fair use, which allow for innovations like large language models, “The use of copyrighted materials by innovators in transformative ways does not violate copyright.”

Moreover, OpenAI asserts that copyright law aims to safeguard the expression of ideas rather than the underlying ideas themselves, facts, or foundational elements of creativity. These elements, OpenAI argues, are crucial for training AI models like ChatGPT.

OpenAI also references the well-known Google Books case as precedent, emphasizing that creating preliminary copies of works for developing new, non-infringing products, even if they compete with the original, does not constitute copyright infringement.

In challenging the authors’ vicarious copyright infringement claim, OpenAI argues that not all ChatGPT outputs should be considered derivative works, thus disputing the authors’ “legally infirm” theory.

Furthermore, OpenAI contests the assertion that the company has a direct financial interest in infringing the copyrights of the authors’ works. It emphasizes that mere usage of OpenAI’s tools by users does not automatically imply such a financial interest.

Regarding allegations related to the Digital Millennium Copyright Act (DMCA), OpenAI questions the authors’ claims that ChatGPT’s training models violate the DMCA by intentionally removing copyright-management information (CMI). OpenAI maintains that there is no evidence to support this theory and suggests that any removal of CMI may have been an unintended side effect of the technological process.

OpenAI’s response also challenges claims made under California state laws, arguing that these claims should be struck down as they are preempted by federal copyright law.

From the authors’ perspective, generative AI like ChatGPT represents a “grift” that merely repackages and detaches human intelligence from its creators, rather than advancing human intelligence.

Not only book authors but also a group of leading news publishers are contemplating legal action against OpenAI, with hopes of securing billions in royalties. The New York Times is also considering legal action that could potentially lead to the destruction of the dataset containing infringing content.

The outcome of this lawsuit and other legal battles of OpenAI will have far-reaching implications for the use of copyrighted material in training AI models and may define the boundaries of copyright law in the AI era. As the legal battle unfolds, it remains to be seen how the court will adjudicate these complex and pivotal issues.