AI Needs Better Data, Not Just More Data


AI has a data quality problem. In a survey of 179 data scientists, over half identified addressing issues related to data quality as the biggest bottleneck in successful AI projects. Big data is so often improperly formatted, lacking metadata, or "dirty," meaning incomplete, incorrect, or inconsistent, that data scientists typically spend 80 percent of their time on cleaning and preparing data to make it usable, leaving them with just 20 percent of their time to focus on actually using data for analysis. This means organizations developing and using AI must devote huge amounts of resources to ensuring they have sufficient amounts of high-quality data so that their AI tools are not useless. As policymakers pursue national strategies to increase their competitiveness in AI, they should recognize that any country that wants to lead in AI must also lead in data quality.