About a week ago, Stanford University researchers posted online a study on the latest dystopian AI: They'd made a machine learning algorithm that essentially works as gaydar. After training the algorithm with tens of thousands of photographs from a dating site, the algorithm could, for example, guess if a white man in a photograph was gay with 81 percent accuracy. They wanted to protect gay people. "[Our] findings expose a threat to the privacy and safety of gay men and women," wrote Michal Kosinski and Yilun Wang in the paper. They built the bomb so they could alert the public about its dangers.
Stanford's review board approved Kosinski and Wang's study. "The vast, vast, vast majority of what we call'big data' research does not fall under the purview of federal regulations," says Metcalf. Take a recent example: Last month, researchers affiliated with Stony Brook University and several major internet companies released a free app, a machine learning algorithm that guesses ethnicity and nationality from a name to about 80 percent accuracy. The group also went through an ethics review at the company that provided training list of names, although Metcalf says that an evaluation at a private company is the "weakest level of review that they could do."
"You are worse than a fool; you have no care for your species. For thousands of years men dreamed of pacts with demons. Only now are such things possible." When William Gibson wrote those words in his groundbreaking 1984, novel Neuromancer, artificial intelligence remained almost entirely within the realm of science fiction. Today, however, the convergence of complex algorithms, big data, and exponential increases in computational power has resulted in a world where AI raises significant ethical and human rights dilemmas, involving rights ranging from the right to privacy to due process.
What's worse is the way that machine learning magnifies these problems. If an employer only hires young applicants, a machine learning algorithm will learn to screen out all older applicants without anyone having to tell it to do so. I recently attended a meeting about some preliminary research on "predictive policing," which uses these machine learning algorithms to allocate police resources to likely crime hotspots. With more engineers participating in policy debates and more policymakers who understand algorithms and big data, both government and civil society organizations will be stronger.
Science fiction novels have long delighted readers by grappling with futuristic challenges like the possibility of artificial intelligence so difficult to distinguish from human beings that people naturally ask, "should these sophisticated computer programs be considered human? Tech industry luminaries such as Tesla CEO Elon Musk have recently endorsed concepts like guaranteed minimum income or universal basic income. Bill Gates recently made headlines with a proposal to impose a "robot tax" -- essentially, a tax on automated solutions to account for the social costs of job displacement. Technology challenges our conception of human rights in other ways, as well.
As data scientists, we are aware that bias exists in the world. We read up on stories about how cognitive biases can affect decision-making. We know that, for instance, a resume with a white-sounding name will receive a different response than the same resume with a black-sounding name, and that writers of performance reviews use different language to describe contributions by women and men in the workplace. We read stories in the news about ageism in healthcare and racism in mortgage lending. Data scientists are problem solvers at heart, and we love our data and our algorithms that sometimes seem to work like magic, so we may be inclined to try to solve these problems stemming from human bias by turning the decisions over to machines.
R users can now use the popular dplyr package to tap into Apache Spark big data. The new sparklyr package is a native dplyr interface to Spark, according to RStudio. After installing the package, users can "interactively manipulate Spark data using both dplyr and SQL (via DBI), according to an RStudio blog post, as well as "filter and aggregate Spark data sets then bring them into R for analysis and visualization." An age discrimination lawsuit against Google has been approved as a '"collective action" by a federal... Election hacking has become a key topic during this year's presidential elections.
Companies that use machine learning and big data in their hiring process use "training data," which is typically taken from prior and current employees. A statistical process then automatically discovers the traits that correlate to high performance among the training data and looks for those traits in the applicant pools. If you have zero employees who are women, people of color, or people with disabilities, it's impossible to evaluate their potential performance through machine learning, Barocas noted. Barocas noted how just several weeks ago, Amazon was blasted by Bloomberg for not offering its premium Amazon Prime service in certain parts of various cities.
Artificial intelligence, long in the realm of science fiction and dystopian visions of the future, pushes further into our reality every day. Algorithms that Google, Facebook, and many other tech companies run are moving along their asymptotic paths toward approximating the neural firings and pathways that make the human brain so powerful. In doing so, though, they use astounding amounts of data that raise concerns from governments and private citizens about the extent to which privacy rights are compromised. As technology advances exponentially, the world is starting to grapple with the logistic and ethical considerations AI has started to raise. AI as a technological marvel serves both as a response and as a way to capitalize on the explosion of big data in the last twenty years.