DescribeML: A tool for describing machine learning datasets
With the rise of machine learning technologies, the need for more and better datasets is becoming one of the main challenges in the industry. Relevant scientists and practitioners, such as Andrew Ng, have proposed the need for a data-centric cultural shift in the machine learning field, where data issues are given the attention they deserve. The idea behind this proposal is simple; Better data to build better machine learning applications. But data, beyond being the power behind the ML applications, can also be the source of ethical and social issues. For instance, recent studies, such as Khalil et al., show how facial analysis datasets with fewer darker-skinned faces could drop the accuracy of face analysis models in that particular group, representing social harm.
Sep-22-2022, 08:50:40 GMT