Data Quality for Big Data and Machine Learning

#artificialintelligence 

Machine learning (ML) has drawn great attention from academics as well as industries during the past decades and continues to achieve impressive human-level performance on nontrivial tasks such as image classification, voice recognition, natural language processing, and autopiloting. Both data and algorithms are critical to ensure the performance, fairness, robustness, reliability, and scalability of ML systems. However, artificial intelligence (AI) researchers and practitioners overwhelmingly concentrate on algorithms while undervaluing the impact of data quality. Recently, a report showed that the cost of data quality is approximately more than 600 billion US dollars per year for the US market alone, and a 2019 survey by Lourentzou indicates that 96% of the companies have run into problems with data quality, data labeling required to train ML. Due to the limitations of algorithmic solutions in AI success, scholars have proposed data-centric AI, with the initiative to carefully design the datasets, evaluate and improve the data quality for enhancing ML systems.This Research Topic focuses on data quality in ML, particularly on how to use state-of-the-art technology on assessment, assurance, and improvement of big data for building high-quality ML systems. Although some efforts have been devoted to data quality improvement for ML, uncovering data quality problems, and developing strategies to assess data quality, the data quality is rarely, rigorously, and systematically ev...