Data Readiness for AI: A 360-Degree Survey

Hiniduma, Kaveen, Byna, Suren, Bez, Jean Luca

Apr-8-2024–arXiv.org Artificial Intelligence

Data are the critical fuel for Artificial Intelligence (AI) models. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Checking for data readiness is a crucial step in improving data quality. Numerous R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used for verifying AI's data readiness. This survey examines more than 120 papers that are published by ACM Digital Library, IEEE Xplore, other reputable journals, and articles published on the web by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy can lead to new standards for DRAI metrics that would be used for enhancing the quality and accuracy of AI training and inference.

application, data readiness, dataset, (12 more...)

arXiv.org Artificial Intelligence

Apr-8-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland (0.04)
  - New South Wales > Sydney (0.04)
- North America > United States
  - District of Columbia > Washington (0.04)
  - Ohio > Franklin County
    - Columbus (0.04)
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.04)
  - California > Alameda County
    - Berkeley (0.04)
- Europe
  - United Kingdom > Scotland
    - City of Edinburgh > Edinburgh (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - India > Karnataka
    - Bengaluru (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Overview (1.00)
- Research Report > New Finding (0.34)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Education (1.00)
- Telecommunications (0.67)
- Law (0.67)

Technology:
- Information Technology
  - Data Science
    - Data Quality (1.00)
    - Data Mining > Big Data (0.92)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language > Text Processing (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found