Davis, Ernest



Planning, Executing, and Evaluating the Winograd Schema Challenge

AI Magazine

The Winograd Schema Challenge was proposed by Hector Levesque in 2011 as an alternative to the Turing Test. Chief among its features is a simple question format that can span many commonsense knowledge domains. Questions are chosen so that they do not require specialized knoweldge or training, and are easy for humans to answer. This article details our plans to run the WSC and evaluate results.


How to Write Science Questions that Are Easy for People and Hard for Computers

AI Magazine

As a challenge problem for AI systems, I propose the use of hand-constructed multiple-choice tests, with problems that are easy for people but hard for computers. Specifically, I discuss techniques for constructing such problems at the level of a fourth-grade child and at the level of a high-school student. For the fourth grade level questions, I argue that questions that require the understanding of time, impossible or pointless scenarios, of causality, of the human body, or of sets of objects, and questions that require combining facts or require simple inductive arguments of indeterminate length can be chosen to be easy for people, and are likely to be hard for AI programs, in the current state of the art. For the high-school level, I argue that questions that relate the formal science to the realia of laboratory experiments or of real-world observations are likely to be easy for people and hard for AI programs.


Planning, Executing, and Evaluating the Winograd Schema Challenge

AI Magazine

The Winograd Schema Challenge was proposed by Hector Levesque in 2011 as an alternative to the Turing Test. Chief among its features is a simple question format that can span many commonsense knowledge domains. Questions are chosen so that they do not require specialized knoweldge or training, and are easy for humans to answer. This article details our plans to run the WSC and evaluate results.


How to Write Science Questions that Are Easy for People and Hard for Computers

AI Magazine

As a challenge problem for AI systems, I propose the use of hand-constructed multiple-choice tests, with problems that are easy for people but hard for computers. Specifically, I discuss techniques for constructing such problems at the level of a fourth-grade child and at the level of a high-school student. For the fourth grade level questions, I argue that questions that require the understanding of time, impossible or pointless scenarios, of causality, of the human body, or of sets of objects, and questions that require combining facts or require simple inductive arguments of indeterminate length can be chosen to be easy for people, and are likely to be hard for AI programs, in the current state of the art. For the high-school level, I argue that questions that relate the formal science to the realia of laboratory experiments or of real-world observations are likely to be easy for people and hard for AI programs. I argue that these are more useful benchmarks than existing standardized tests such as the SATs or Regents tests. Since the questions in standardized tests are designed to be hard for people, they often leave many aspects of what is hard for computers but easy for people untested


The Limitations of Standardized Science Tests as Benchmarks for Artificial Intelligence Research: Position Paper

arXiv.org Artificial Intelligence

In this position paper, I argue that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science. The primary problem is that these tests are designed to test aspects of knowledge and ability that are challenging for people; the aspects that are challenging for AI systems are very different. In particular, standardized tests do not test knowledge that is obvious for people; none of this knowledge can be assumed in AI systems. Individual standardized tests also have specific features that are not necessarily appropriate for an AI benchmark. I analyze the Physics subject SAT in some detail and the New York State Regents Science test more briefly. I also argue that the apparent advantages offered by using standardized tests are mostly either minor or illusory. The one major real advantage is that the significance is easily explained to the public; but I argue that even this is a somewhat mixed blessing. I conclude by arguing that, first, more appropriate collections of exam style problems could be assembled, and second, that there are better kinds of benchmarks than exam-style problems. In an appendix I present a collection of sample exam-style problems that test kinds of knowledge missing from the standardized tests.


The Winograd Schema Challenge

AAAI Conferences

In this paper, we present an alternative to the Turing Test that has some conceptual and practical advantages. A Winograd schema is a pair of sentences that differ only in one or two words and that contain a referential ambiguity that is resolved in opposite directions in the two sentences. We have compiled a collection of Winograd schemas, designed so that the correct answer is obvious to the human reader, but cannot easily be found using selectional restrictions or statistical techniques over text corpora. A contestant in the Winograd Schema Challenge is presented with a collection of one sentence from each pair, and required to achieve human-level accuracy in choosing the correct disambiguation.


Reports of the AAAI 2011 Spring Symposia

AI Magazine

The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University's Department of Computer Science, presented the 2011 Spring Symposium Series Monday through Wednesday, March 21–23, 2011 at Stanford University. The titles of the eight symposia were AI and Health Communication, Artificial Intelligence and Sustainable Design, AI for Business Agility, Computational Physiology, Help Me Help You: Bridging the Gaps in Human-Agent Collaboration, Logical Formalizations of Commonsense Reasoning, Multirobot Systems and Physical Data Structures, and Modeling Complex Adaptive Systems As If They Were Voting Processes.


Reports of the AAAI 2011 Spring Symposia

AI Magazine

The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University’s Department of Computer Science, presented the 2011 Spring Symposium Series Monday through Wednesday, March 21–23, 2011 at Stanford University. The titles of the eight symposia were AI and Health Communication, Artificial Intelligence and Sustainable Design, AI for Business Agility, Computational Physiology, Help Me Help You: Bridging the Gaps in Human-Agent Collaboration, Logical Formalizations of Commonsense Reasoning, Multirobot Systems and Physical Data Structures, and Modeling Complex Adaptive Systems As If They Were Voting Processes. This report summarizes the eight symposia.