Here's Proof You Can Train an AI Model Without Slurping Copyrighted Content

WIRED 

A group of researchers backed by the French government have released what is thought to be the largest AI training dataset composed entirely of text that is in the public domain. "There's no fundamental reason why someone couldn't train an LLM fairly," says Ed Newton-Rex, CEO of Fairly Trained. He founded the nonprofit in January 2024 after quitting his executive role at image generation startup Stability AI because he disagreed with its policy of scraping content without permission. Fairly Trained offers a certification to companies willing to prove that they've trained their AI models on data that they either own, have licensed, or is in the public domain. When the nonprofit launched, some critics pointed out that it hadn't yet identified a large language model that met those requirements.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found