The Importance of Open-Source ML Datasets
'Data is the new oil' is an over-marketed quote, but one that is certainly true when it comes to machine learning (ML). In an ML world dominated by supervised learning techniques, having access to high-quality labeled datasets is essential to advance ML research and practical implementations. However, labeled datasets are computationally expensive to produce and remain a privilege of large companies, which increases the gap between the "haves" and the "have nots" in the ML space. Beyond the impact in the economics of the ML market, access to high-quality datasets is fundamental to advance research in different ML fields. Datasets such as ImageNet were kind of a Sputnik moment (we mean the first artificial satellite) in ML, sparking remarkable breakthroughs in computer vision.
Jun-27-2021, 14:30:38 GMT