In a move to safeguard its reporting from OpenAI, The New York Times is considering legal action against the AI lab, which has been scraping information from the internet for years to train its Large Language Models. The development was reported by NPR last week, citing two unnamed sources familiar with the matter.

Earlier this week, the paper also blocked OpenAI’s recently launched crawler, GPTbot, to ensure that the AI company doesn’t use NYT’s content to train its models. This was followed by the media company updating its terms earlier this month to prohibit AI or machine learning systems from using any type of its content to train their models.

Many leading media companies, led by IAC’s chairman Barry Diller, have been making efforts to form a coalition that would jointly negotiate with OpenAI, Google, and Microsoft regarding the use of their content for AI systems. The New York Times also considered joining this group but decided against it. IAC’s CEO Joey Levin believes that the royalties news publishers should receive from AI companies for using their content should amount to billions of dollars.

The New York Times and OpenAI , according to the report by NPR, have been negotiating a licensing deal for weeks. However, the negotiations have not yielded any results, leading the media company to consider suing the AI firm. A significant concern for The New York Times is that OpenAI is, in a way, directly competing with them by offering users information derived from their reporting within ChatGPT.

If the lawsuit proceeds, it could become the most significant legal battle concerning IP protection in the era of generative AI. But it won’t be the first one. There have been many others, including that of Getty Images, a leading visual media company and supplier of stock images. Getty Images sued Stability AI, the company behind AI image generation tool Stable Diffusion, for allegedly unlawfully copying and processing millions of copyrighted images from its website to train its software earlier this year.

In the event that The New York Times takes the battle to court and it is determined that OpenAI violated copyrights and used paper’s content to train its systems, the court could potentially order the destruction of the dataset containing infringing content and mandate its recreation using authorized content for usage. The court may also impose fines of up to $150,000 for each willfully committed infringement on OpenAI.