Zuckerberg approved Meta's use of 'pirated' books to train AI models, authors claim

Jan-10-2025, 13:09:02 GMT–The Guardian

Citing internal Meta communications, the filing claims that the social network company's chief executive backed the use of the LibGen dataset, a vast online archive of books, despite warnings within the company's AI executive team that it is a dataset "we know to be pirated". The internal message says that using a database containing pirated material could weaken the Facebook and Instagram owner's negotiations with regulators, according to the filing. "Media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, may undermine our negotiating position with regulators." The authors sued Meta in 2023, arguing that the social media company misused their books to train Llama, the large language model that powers its chatbots. The Library Genesis, or LibGen, dataset is a "shadow library" that originated in Russia and claims to contain millions of novels, nonfiction books and science magazine articles.

ai model, meta, zuckerberg, (14 more...)

The Guardian

Jan-10-2025, 13:09:02 GMT

News Web Page

Add feedback

Country:
- Europe > Russia (0.26)
- Asia > Russia (0.26)
- North America > United States
  - New York (0.06)
  - California (0.06)

Industry:
- Law (1.00)
- Information Technology > Services (1.00)
- Media (0.95)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence > Natural Language
    - Chatbot (0.39)
    - Large Language Model (0.37)