Big Data will be biased, if we let it


If I had a penny for every time I've heard "data doesn't lie"… For those of us who have the ever exciting and growing task of working with Big Data to help solve some of organization's biggest inefficiencies, questions, or problems, perpetuating bias is a way too easy-to-make mistake, and we should all be familiarized with it by now. My first encounter with the concept of data driven bias blew my mind, and made me wonder how I hadn't seen this before. It was ProPublica's essay titled Machine Bias. The tl;dr story here is that several states in the US implemented an algorithm to predict the risk of defendants in court reoffending, and use this value as a factor during sentencing. Interestingly enough, race or ethnicity claimed not to be variables in this algorithms, but it somehow fails blacks the most.