Here's Proof You Can Train an AI Model Without Slurping Copyrighted Content

Mar-20-2024, 16:00:00 GMT–WIRED

A group of researchers backed by the French government have released what is thought to be the largest AI training dataset composed entirely of text that is in the public domain. "There's no fundamental reason why someone couldn't train an LLM fairly," says Ed Newton-Rex, CEO of Fairly Trained. He founded the nonprofit in January 2024 after quitting his executive role at image generation startup Stability AI because he disagreed with its policy of scraping content without permission. Fairly Trained offers a certification to companies willing to prove that they've trained their AI models on data that they either own, have licensed, or is in the public domain. When the nonprofit launched, some critics pointed out that it hadn't yet identified a large language model that met those requirements.

dataset, language model, slurping copyrighted content, (13 more...)

WIRED

Mar-20-2024, 16:00:00 GMT

News Web Page

Add feedback

Country:
- Europe > France (0.36)
- North America > United States
  - Illinois > Cook County > Chicago (0.05)

Industry:
- Government > Regional Government (0.36)
- Law > Intellectual Property & Technology Law (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.58)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found