We finally have a definition for open-source AI

Aug-22-2024, 13:00:00 GMT–MIT Technology Review

Ayah Bdeir, a senior advisor to Mozilla and a participant in OSI's process, says certain parts of the open-source definition were relatively easy to agree upon, including the need to reveal model weights (the parameters that help determine how an AI model generates an output). Other parts of the deliberations were more contentious, particularly the question of how public training data should be. The lack of transparency about where training data comes from has led to innumerable lawsuits against big AI companies, from makers of large language models like OpenAI to music generators like Suno, which do not disclose much about their training sets beyond saying they contain "publicly accessible information." Ultimately, the new definition requires that open-source models provide information about the training data to the extent that "a skilled person can recreate a substantially equivalent system using the same or similar data." It's not a blanket requirement to share all training data sets, but it also goes further than what many proprietary models or even ostensibly open-source models do today. "Insisting on an ideologically pristine kind of gold standard that actually will not effectively be met by anybody ends up backfiring," Bdeir says.

bdeir, open-source ai, training data, (3 more...)

MIT Technology Review

Aug-22-2024, 13:00:00 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.60)