Who Trained Your A.I.?
When a company depends on data it didn't collect itself, it disincentivizes the opening up of A.I. systems for scrutiny, explains Levendowski, since it would mean that if companies are making their A.I. systems smarter with unlicensed data they could be held liable. The problem with turning to public-domain data, though, is that it is generally old, which means it may reflect the mores and biases of its time. From what books get published to what subjects doctors chose to conduct medical studies on, the history of racism and sexism in America is, in a sense, mirrored through old published data that is now available for free. And when it comes to using data sets that were leaked or released during a criminal investigation, the problem with that data is that it's often publically available because it is so controversial and problematic.