Meta's AI memorised books verbatim – that could cost it billions

Jun-10-2025, 18:00:39 GMT–New Scientist

Authors and publishers have filed multiple lawsuits over this issue, and in a new twist, researchers have shown that at least one AI model has not only used popular books in its training data, but also memorised their contents verbatim. But now, researchers have tested multiple models to see how much of that training data they can spit back out verbatim. They found that many models do not retain the exact text of the books in their training data – but one of Meta's models has memorised almost the entirety of certain books. If judges rule against the company, the researchers estimate that this could make Meta liable for at least 1 billion in damages. "That means, on the one hand, that AI models are not just'plagiarism machines', as some have alleged, but it also means that they do more than just learn general relationships between words," says Mark Lemley at Stanford University in California.

large language model, machine learning, natural language, (22 more...)

New Scientist

Jun-10-2025, 18:00:39 GMT

News Web Page

Add feedback

Country:
- North America > United States
  - California (0.26)
  - Oklahoma (0.05)
- Europe > United Kingdom
  - England > Greater London > London (0.05)

Genre:
- Research Report > New Finding (0.35)

Industry:
- Law > Litigation (0.71)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (0.99)
    - Chatbot (0.98)
  - Machine Learning > Neural Networks
    - Deep Learning (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found