Goto

Collaborating Authors

 Education


My Computer Is an Honor Student -- but How Intelligent Is It? Standardized Tests as a Measure of AI

AI Magazine

Given the well-known limitations of the Turing Test, there is a need for objective tests to both focus attention on, and measure progress towards, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling - critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world.


AAAI News

AI Magazine

The 2016 winners were as follows: Tom Dietterich, AAAI President, for AAAI 2017 Awards, please Manuela Veloso, AAAI Past President contact Carol Hamilton at hamilton@aaai.org.


WWTS (What Would Turing Say?)

AI Magazine

WWTS (What Would Turing Say?) Turing's Imitation Game was a brilliant Turing was heavily influenced by the World War II "game" If Turing were alive today, what sort of test might he propose? If a machine could fool interrogators as often as a typical man, then one would have to conclude that that machine, as programmed, was as intelligent as a person (well, as intelligent as men.) As Judy Genova (1994) puts it, Turing's originally proposed game involves not a question of species, but one of gender. The current version, where the interrogator is told he or she needs to distinguish a person from a machine, is (1) much more difficult to get a program to pass, and (2) almost all the added difficulties are largely irrelevant to intelligence! And it's possible to muddy the waters even more by some programs appearing to do well at it due to various tricks, such as having the interviewee program claim to be a 13-year-old Ukrainian who doesn't speak English well (University of Reading 2014), and hence having all its wrong or bizarre responses excused due to cultural, age, or language issues.


The Social-Emotional Turing Challenge

AI Magazine

Social-emotional intelligence is an essential part of being a competent human and is thus required for human-level AI. When considering alternatives to the Turing Test it is therefore a capacity that is important to test. We characterize this capacity as affective theory of mind and describe some unique challenges associated with its interpretive or generative nature. Mindful of these challenges we describe a five-step method along with preliminary investigations into its application. We also describe certain characteristics of the approach such as its incremental nature, and countermeasures that make it difficult to game or cheat.


Software Social Organisms: Implications for Measuring AI Progress

AI Magazine

In this article I argue that achieving human-level AI is equivalent to learning how to create sufficiently smart software social organisms. This implies that no single test will be sufficient to measure progress. Instead, evaluations should be organized around showing increasing abilities to participate in our culture, as apprentices. This provides multiple dimensions within which progress can be measured, including how well different interaction modalities can be used, what range of domains can be tackled, what human-normed levels of knowledge they are able to acquire, as well as others. I begin by motivating the idea of software social organisms, drawing on ideas from other areas of cognitive science, and provide an analysis of the substrate capabilities that are needed in social organisms in terms closer to what is needed for computational modeling. Finally, the implications for evaluation are discussed.


Measuring Machine Intelligence Through Visual Question Answering

AI Magazine

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machineโ€™s ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.


How to Write Science Questions that Are Easy for People and Hard for Computers

AI Magazine

As a challenge problem for AI systems, I propose the use of hand-constructed multiple-choice tests, with problems that are easy for people but hard for computers. Specifically, I discuss techniques for constructing such problems at the level of a fourth-grade child and at the level of a high-school student. For the fourth grade level questions, I argue that questions that require the understanding of time, impossible or pointless scenarios, of causality, of the human body, or of sets of objects, and questions that require combining facts or require simple inductive arguments of indeterminate length can be chosen to be easy for people, and are likely to be hard for AI programs, in the current state of the art. For the high-school level, I argue that questions that relate the formal science to the realia of laboratory experiments or of real-world observations are likely to be easy for people and hard for AI programs. I argue that these are more useful benchmarks than existing standardized tests such as the SATs or Regents tests. Since the questions in standardized tests are designed to be hard for people, they often leave many aspects of what is hard for computers but easy for people untested


Toward a Comprehension Challenge, Using Crowdsourcing as a Tool

AI Magazine

Human readers comprehend vastly more, and in vastly different ways, than any existing comprehension test would suggest. An ideal comprehension test for a story should cover the full range of questions and answers that humans would expect other humans to reasonably learn or infer from a given story. As a step toward these goals we propose a novel test, the Crowdsourced Comprehension Challenge (C3), which is constructed by repeated runs of a three-person game, the Iterative Crowdsourced Comprehension Game (ICCG). ICCG uses structured crowdsourcing to comprehensively generate relevant questions and supported answers for arbitrary stories, whether fiction or nonfiction, presented across a variety of media such as videos, podcasts, and still images.


My Computer Is an Honor Student โ€” but How Intelligent Is It? Standardized Tests as a Measure of AI

AI Magazine

Given the well-known limitations of the Turing Test, there is a need for objective tests to both focus attention on, and measure progress towards, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling - critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world. Here we propose this task as a challenge problem for the community, summarize our state-of-the-art results on math and science tests, and provide supporting datasets