OpenAI: AI models like ChatGPT are impossible to make without copyrighted material

OpenAI, the pioneering developer in the realm of artificial intelligence, has underscored the essential role of access to copyrighted material in crafting revolutionary tools like its acclaimed chatbot, ChatGPT.

This revelation comes at a time when the spotlight is increasingly focused on artificial intelligence companies and the content they employ to train their products.

ChatGPT, along with other AI innovations like Stable Diffusion, owes its proficiency to a training process that involves a vast reservoir of data harvested from the internet. It is noteworthy that a significant portion of this data is subject to copyright protection, a legal safeguard that prevents the unauthorized use of an individual’s work.

Recent developments have brought OpenAI and its prominent investor, Microsoft, under the legal spotlight. The New York Times initiated a lawsuit against both entities, alleging “unlawful use” of its intellectual property in the development of their respective products.

OpenAI retaliated by accusing the New York Times of using manipulative prompts to produce copyright violations.

In response to these legal challenges and inquiries, OpenAI has presented a crucial perspective to the House of Lords communications and digital select committee. OpenAI has asserted that the training of expansive language models, exemplified by their GPT-4 model – the core technology powering ChatGPT – is an intricate process that heavily relies on access to copyrighted materials.

Here is what OpenAI said in its submission: “Because copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.”

The company added that training AI models on non-copyrighted content would be “an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

OpenAI also responded to the New York Times lawsuit on its official X account, explaining its position on the matter. The company says that training is fair use and there is an option for users to opt out of it. It also claimed that the New York Times’ complaint does not tell the full story and “regurgitation” is a rare bug they’re working to fix.