DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems

Seedat, Nabeel, Imrie, Fergus, van der Schaar, Mihaela

Nov-9-2022–arXiv.org Artificial Intelligence

While there have been a number of remarkable breakthroughs in machine learning (ML), much of the focus has been placed on model development. However, to truly realize the potential of machine learning in real-world settings, additional aspects must be considered across the ML pipeline. Data-centric AI is emerging as a unifying paradigm that could enable such reliable end-to-end pipelines. However, this remains a nascent area with no standardized framework to guide practitioners to the necessary data-centric considerations or to communicate the design of data-centric driven ML systems. To address this gap, we propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations at different stages of the ML pipeline: Data, Training, Testing, and Deployment. This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development. Additionally, we highlight specific data-centric AI challenges and research opportunities. DC-Check is aimed at both practitioners and researchers to guide day-to-day development. As such, to easily engage with and use DC-Check and associated resources, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an updated resource as methods and tooling evolve over time.

artificial intelligence, data quality, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Nov-9-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Africa (0.04)
- Europe
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Thailand (0.04)
  - India (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government (0.67)
- Health & Medicine
  - Diagnostic Medicine > Imaging (0.92)
  - Therapeutic Area > Ophthalmology/Optometry (0.68)

Technology:
- Information Technology
  - Data Science > Data Quality
    - Data Cleaning (0.69)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (0.67)
    - Neural Networks > Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found