This is where the data to build AI comes from
Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies. In the early 2010s, data sets came from a variety of sources, says Shayne Longpre, a researcher at MIT who is part of the project. It came not just from encyclopedias and the web, but also from sources such as parliamentary transcripts, earning calls, and weather reports. Back then, AI data sets were specifically curated and collected from different sources to suit individual tasks, Longpre says. Then transformers, the architecture underpinning language models, were invented in 2017, and the AI sector started seeing performance get better the bigger the models and data sets were.
Dec-18-2024, 10:50:10 GMT
- Technology: