What I Found in a Database Meta Uses to Train Generative AI
Editor's note: This article is part of The Atlantic's series on Books3. You can search the database for yourself here, and read about its origins here. This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. "Books3," as it's called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. Books play a crucial role in the training of generative-AI systems.
Sep-25-2023, 17:27:40 GMT