DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems
Seedat, Nabeel, Imrie, Fergus, van der Schaar, Mihaela
–arXiv.org Artificial Intelligence
While there have been a number of remarkable breakthroughs in machine learning (ML), much of the focus has been placed on model development. However, to truly realize the potential of machine learning in real-world settings, additional aspects must be considered across the ML pipeline. Data-centric AI is emerging as a unifying paradigm that could enable such reliable end-to-end pipelines. However, this remains a nascent area with no standardized framework to guide practitioners to the necessary data-centric considerations or to communicate the design of data-centric driven ML systems. To address this gap, we propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations at different stages of the ML pipeline: Data, Training, Testing, and Deployment. This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development. Additionally, we highlight specific data-centric AI challenges and research opportunities. DC-Check is aimed at both practitioners and researchers to guide day-to-day development. As such, to easily engage with and use DC-Check and associated resources, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an updated resource as methods and tooling evolve over time.
arXiv.org Artificial Intelligence
Nov-9-2022
- Country:
- Africa (0.04)
- Asia
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States (0.28)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Government (0.67)
- Health & Medicine
- Diagnostic Medicine > Imaging (0.92)
- Health Care Technology (0.67)
- Therapeutic Area > Ophthalmology/Optometry (0.68)
- Information Technology > Security & Privacy (0.67)
- Technology: