Goto

Collaborating Authors

Question Answering as Global Reasoning over Semantic Abstractions

arXiv.org Artificial Intelligence

We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Representing multiple abstractions as a family of graphs, we translate question answering (QA) into a search for an optimal subgraph that satisfies certain global and local properties. This formulation generalizes several prior structured QA systems. Our system, SEMANTICILP, demonstrates strong performance on two domains simultaneously. In particular, on a collection of challenging science QA datasets, it outperforms various state-of-the-art approaches, including neural models, broad coverage information retrieval, and specialized techniques using structured knowledge bases, by 2%-6%.


Question Answering as Global Reasoning Over Semantic Abstractions

AAAI Conferences

We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Representing multiple abstractions as a family of graphs, we translate question answering (QA) into a search for an optimal subgraph that satisfies certain global and local properties. This formulation generalizes several prior structured QA systems. Our system, SEMANTICILP, demonstrates strong performance on two domains simultaneously. In particular, on a collection of challenging science QA datasets, it outperforms various state-of-the-art approaches, including neural models, broad coverage information retrieval, and specialized techniques using structured knowledge bases, by 2%-6%.


From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

arXiv.org Artificial Intelligence

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions. In addition, our Aristo system, building upon the success of recent language models, exceeded 83% on the corresponding Grade 12 Science Exam NDMC questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern NLP methods can result in mastery on this task. While not a full solution to general question-answering (the questions are multiple choice, and the domain is restricted to 8th Grade science), it represents a significant milestone for the field.


ExplanationLP: Abductive Reasoning for Explainable Science Question Answering

arXiv.org Artificial Intelligence

We propose a novel approach for answering and explaining multiple-choice science questions by reasoning on grounding and abstract inference chains. This paper frames question answering as an abductive reasoning problem, constructing plausible explanations for each choice and then selecting the candidate with the best explanation as the final answer. Our system, ExplanationLP, elicits explanations by constructing a weighted graph of relevant facts for each candidate answer and extracting the facts that satisfy certain structural and semantic constraints. To extract the explanations, we employ a linear programming formalism designed to select the optimal subgraph. The graphs' weighting function is composed of a set of parameters, which we fine-tune to optimize answer selection performance. We carry out our experiments on the WorldTree and ARC-Challenge corpus to empirically demonstrate the following conclusions: (1) Grounding-Abstract inference chains provides the semantic control to perform explainable abductive reasoning (2) Efficiency and robustness in learning with a fewer number of parameters by outperforming contemporary explainable and transformer-based approaches in a similar setting (3) Generalisability by outperforming SOTA explainable approaches on general science question sets.


A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

arXiv.org Artificial Intelligence

The recent work of Clark et al. introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the Challenge Set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.