A new tool for copyright holders can show if their work is in AI training data
A number of publishers and writers are in the middle of litigation against tech companies, claiming their intellectual property has been scraped into AI training data sets without their permission. The New York Times' ongoing case against OpenAI is probably the most high-profile of these. "There is a complete lack of transparency in terms of which content is used to train models, and we think this is preventing finding the right balance [between AI companies and content creators]," says Yves-Alexandre de Montjoye, an associate professor of applied mathematics and computer science at Imperial College London, who led the research. It was presented at the International Conference on Machine Learning, a top AI conference being held in Vienna this week. To create the traps, the team used a word generator to create thousands of synthetic sentences.
Jul-25-2024, 17:09:21 GMT