Goto

Collaborating Authors

 problem population


A primer on the sources of biases in data-mining for machine learning

#artificialintelligence

Despite rising levels of automation through big-data, much of the data-mining and machine learning process still relies on human intervention, introducing different biases. The amount of structured and unstructured data generated has grown exponentially over the last few decades and will continue to do so for years to come. The'big data' analytics could potentially overcome numerous challenges that corporations and governments have faced for centuries while making decisions: the lack of adequate data for formulating policies (e.g., targeting policies for a particular social group) or examining market or consumer expectations (e.g., recommendation system). The descriptive as well as predictive modeling that is driven by the big data paradigm can help decision-makers derive valuable insights for personal, commercial, or collective gains. However, the modern data collection process and algorithms remain susceptible to data mining biases. Without taking appropriate measures, the big data can amplify the negative effect of the existing social issues (e.g., racial discrimination) and render the findings worthless or even counterproductive [1], [2].