The articles in this special issue of AI Magazine include those that propose specific tests and those that look at the challenges inherent in building robust, valid, and reliable tests for advancing the state of the art in AI. To people outside the field, the test -- which hinges on the ability of machines to fool people into thinking that they (the machines) are people -- is practically synonymous with the quest to create machine intelligence. Within the field, the test is widely recognized as a pioneering landmark, but also is now seen as a distraction, designed over half a century ago, and too crude to really measure intelligence. Intelligence is, after all, a multidimensional variable, and no one test could possibly ever be definitive truly to measure it. Moreover, the original test, at least in its standard implementations, has turned out to be highly gameable, arguably an exercise in deception rather than a true measure of anything especially correlated with intelligence.
We propose an alternative to the Turing test that removes the inherent asymmetry between humans and machines in Turing’s original imitation game. In this new test, both humans and machines judge each other. We argue that this makes the test more robust against simple deceptions. We also propose a small number of refinements to improve further the test. These refinements could be applied also to Turing’s original imitation game.
Human readers comprehend vastly more, and in vastly different ways, than any existing comprehension test would suggest. An ideal comprehension test for a story should cover the full range of questions and answers that humans would expect other humans to reasonably learn or infer from a given story. ICCG uses structured crowdsourcing to comprehensively generate relevant questions and supported answers for arbitrary stories, whether fiction or nonfiction, presented across a variety of media such as videos, podcasts, and still images. While the AI scientific community had hoped that by 2015 machines would be able to read and comprehend language, current models are typically superficial, capable of understanding sentences in limited domains (such as extracting movie times and restaurant locations from text) but without the sort of widecoverage comprehension that we expect of any teenager. Comprehension itself extends beyond the written word; most adults and children can comprehend a variety of narratives, both fiction and nonfiction, presented in a wide variety of formats, such as movies, television and radio programs, written stories, YouTube videos, still images, and cartoons.
Human readers comprehend vastly more, and in vastly different ways, than any existing comprehension test would suggest. An ideal comprehension test for a story should cover the full range of questions and answers that humans would expect other humans to reasonably learn or infer from a given story. As a step toward these goals we propose a novel test, the Crowdsourced Comprehension Challenge (C3), which is constructed by repeated runs of a three-person game, the Iterative Crowdsourced Comprehension Game (ICCG). ICCG uses structured crowdsourcing to comprehensively generate relevant questions and supported answers for arbitrary stories, whether fiction or nonfiction, presented across a variety of media such as videos, podcasts, and still images.
If the artificial intelligence research community is to have a challenge problem as an incentive for research, as many have called for, it behooves us to learn the principles of past successful inducement prize competitions. Those principles argue against the Turing test proper as an appropriate task, despite its appropriateness as a criterion (perhaps the only one) for attributing intelligence to a machine. Gary Marcus in The New Yorker asks "What Comes After the Turing Test?" and wants "to update a sixty-four-year-old test for the modern era" (Marcus 2014). Moshe Vardi in his Communications of the ACM article "Would Turing Have Passed the Turing Test?" opines that "It's time to consider the Imitation Game as just a game" (Vardi 2014). The popular media recommends that we "Forget the Turing Test" and replace it with a "better way to measure intelligence" (Locke 2014).