A new tool for copyright holders can show if their work is in AI training data

Jul-25-2024, 17:09:21 GMT–MIT Technology Review

A number of publishers and writers are in the middle of litigation against tech companies, claiming their intellectual property has been scraped into AI training data sets without their permission. The New York Times' ongoing case against OpenAI is probably the most high-profile of these. "There is a complete lack of transparency in terms of which content is used to train models, and we think this is preventing finding the right balance [between AI companies and content creators]," says Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, who led the research. It was presented at the International Conference on Machine Learning, a top AI conference being held in Vienna this week. To create the traps, the team used a word generator to create thousands of synthetic sentences.

ai training data, membership inference attack, training data, (5 more...)

MIT Technology Review

Jul-25-2024, 17:09:21 GMT

News Web Page

Add feedback

Country:
- Europe > Austria > Vienna (0.26)

Genre:
- Research Report (0.38)

Industry:
- Law (0.58)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.38)