In this paper, we present an alternative to the Turing Test that has some conceptual and practical advantages. A Winograd schema is a pair of sentences that differ only in one or two words and that contain a referential ambiguity that is resolved in opposite directions in the two sentences. We have compiled a collection of Winograd schemas, designed so that the correct answer is obvious to the human reader, but cannot easily be found using selectional restrictions or statistical techniques over text corpora. A contestant in the Winograd Schema Challenge is presented with a collection of one sentence from each pair, and required to achieve human-level accuracy in choosing the correct disambiguation.
This paper describes the Winograd Schema Challenge (WSC), which has been suggested as an alternative to the Turing Test and as a means of measuring progress in commonsense reasoning. A competition based on the WSC has been organized and announced to the AI research community. The WSC is of special interest to the AI applications community and we encourage its members to participate.
Hector Levesque has a strong critical position regarding the place of the Turing Test in Artificial Intelligence. A key argument concerns the test’s use of, or even, reliance on deception for subjectively demonstrating intelligence, and counters with a test of his own based on Winograd Schemas that he suggests is more objective. We argue that the subjectivity of the test is a strength, and that evaluating the outcome of Levesque’s objective test introduces other problems.
We propose an alternative to the Turing test that removes the inherent asymmetry between humans and machines in Turing’s original imitation game. In this new test, both humans and machines judge each other. We argue that this makes the test more robust against simple deceptions. We also propose a small number of refinements to improve further the test. These refinements could be applied also to Turing’s original imitation game.
The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a test for machine intelligence. In this short paper we "situate" the WS challenge in the data-information-knowledge continuum, suggesting in the process what a good WS is. Furthermore, we suggest that the WS is a special case of a more general phenomenon in language understanding, namely the phenomenon of the "missing text". In particular, we will argue that what we usually call thinking in the process of language understanding almost always involves discovering the missing text - text is rarely explicitly stated but is implicitly assumed as shared background knowledge. We therefore suggest extending the WS challenge to include tests beyond those involving reference resolution, including examples that require discovering the missing text in situations that are usually treated in computational linguistics under different labels, such as metonymy, quantifier scope ambiguity, lexical disambiguation, and co-predication, to name a few.