Given the well-known limitations of the Turing Test, there is a need for objective tests to both focus attention on, and measure progress towards, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling - critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world.
Chaudhri, Vinay K. (SRI International) | Cheng, Britte (SRI International) | Overtholtzer, Adam (SRI International) | Roschelle, Jeremy (SRI International) | Spaulding, Aaron (SRI International) | Clark, Peter (Vulcan Inc.) | Greaves, Mark (Pacific Northwest National Laboratory) | Gunning, Dave (Palo Alto Research Center)
Inquire Biology is a prototype of a new kind of intelligent textbook -- one that answers students' questions, engages their interest, and improves their understanding. Inquire Biology provides unique capabilities via a knowledge representation that captures conceptual knowledge from the textbook and uses inference procedures to answer students' questions. In an initial controlled experiment, community college students using the Inquire Biology prototype outperformed students using either a hardcopy or conventional E-book version of the same biology textbook. While additional research is needed to fully develop Inquire Biology, the initial prototype clearly demonstrates the promise of applying knowledge representation and question-answering technology to electronic textbooks.
Friedland, Noah S., Allen, Paul G., Matthews, Gavin, Witbrock, Michael, Baxter, David, Curtis, Jon, Shepard, Blake, Miraglia, Pierluigi, Angele, Jurgen, Staab, Steffen, Moench, Eddie, Oppermann, Henrik, Wenke, Dirk, Israel, David, Chaudhri, Vinay, Porter, Bruce, Barker, Ken, Fan, James, Chaw, Shaw Yi, Yeh, Peter, Tecuci, Dan, Clark, Peter
Vulcan selected three teams, each of which was to formally represent 70 pages from the advanced placement (AP) chemistry syllabus and deliver knowledge-based systems capable of answering questions on that syllabus. The evaluation quantified each system's coverage of the syllabus in terms of its ability to answer novel, previously unseen questions and to provide human- readable answer justifications. These justifications will play a critical role in building user trust in the question-answering capabilities of Digital Aristotle. This article presents the motivation and longterm goals of Project Halo, describes in detail the six-month first phase of the project -- the Halo Pilot -- its KR&R challenge, empirical evaluation, results, and failure analysis.