longpre
Researchers Propose a Better Way to Report Dangerous AI Flaws
In late 2023, a team of third party researchers discovered a troubling glitch in OpenAI's widely used artificial intelligence model GPT-3.5. When asked to repeat certain words a thousand times, the model began repeating the word over and over, then suddenly switched to spitting out incoherent text and snippets of personal information drawn from its training data, including parts of names, phone numbers, and email addresses. The team that discovered the problem worked with OpenAI to ensure the flaw was fixed before revealing it publicly. It is just one of scores of problems found in major AI models in recent years. In a proposal released today, more than 30 prominent AI researchers, including some who found the GPT-3.5 flaw, say that many other vulnerabilities affecting popular models are reported in problematic ways.
- Information Technology > Security & Privacy (0.53)
- Law (0.36)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
This is where the data to build AI comes from
Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies. In the early 2010s, data sets came from a variety of sources, says Shayne Longpre, a researcher at MIT who is part of the project. It came not just from encyclopedias and the web, but also from sources such as parliamentary transcripts, earning calls, and weather reports. Back then, AI data sets were specifically curated and collected from different sources to suit individual tasks, Longpre says. Then transformers, the architecture underpinning language models, were invented in 2017, and the AI sector started seeing performance get better the bigger the models and data sets were.