Goto

Collaborating Authors

 Question Answering


From Classical to Consistent Query Answering under Existential Rules

AAAI Conferences

Querying inconsistent ontologies is an intriguing new problem that gave rise to a flourishing research activity in the description logic (DL) community. The computational complexity of consistent query answering under the main DLs is rather well understood; however, little is known about existential rules. The goal of the current work is to perform an in-depth analysis of the complexity of consistent query answering under the main decidable classes of existential rules enriched with negative constraints. Our investigation focuses on one of the most prominent inconsistency-tolerant semantics, namely, the AR semantics. We establish a generic complexity result, which demonstrates the tight connection between classical and consistent query answering. This result allows us to obtain in a uniform way a relatively complete picture of the complexity of our problem.


Explaining Watson: Polymath Style

AAAI Conferences

Our paper is actually two contributions in one. First, we argue that IBM's Jeopardy! playing machine needs a formal semantics. We present several arguments as we discuss the system. We also situate the work in the broader context of contemporary AI. Our second point is that the work in this area might well be done as a broad collaborative project. Hence our "Blue Sky'' contribution is a proposal to organize a polymath-style effort aimed at developing formal tools for the study of state of the art question-answer systems, and other large scale NLP efforts whose architectures and algorithms lack a theoretical foundation.


Mining Query Subtopics from Questions in Community Question Answering

AAAI Conferences

This paper proposes mining query subtopics from questions in community question answering (CQA). The subtopics are represented as a number of clusters of questions with keywords summarizing the clusters. The task is unique in that the subtopics from questions can not only facilitate user browsing in CQA search, but also describe aspects of queries from a question-answering perspective. The challenges of the task include how to group semantically similar questions and how to find keywords capable of summarizing the clusters. We formulate the subtopic mining task as a non-negative matrix factorization (NMF) problem and further extend the model of NMF to incorporate question similarity estimated from metadata of CQA into learning. Compared with existing methods, our method can jointly optimize question clustering and keyword extraction and encourage the former task to enhance the latter. Experimental results on large scale real world CQA datasets show that the proposed method significantly outperforms the existing methods in terms of keyword extraction, while achieving a comparable performance to the state-of-the-art methods for question clustering.


A Tri-Role Topic Model for Domain-Specific Question Answering

AAAI Conferences

Stack Overflow and MedHelp are examples of domain-specific community-based question answering (CQA) systems. Different from CQA systems for general topics (e.g., Yahoo! Answers, Baidu Knows), questions and answers in domain-specific CQA systems are mostly in the same topical domain, enabling more comprehensive interaction between users on fine-grained topics. In such systems, users are more likely to ask questions on unfamiliar topics and to answer questions matching their expertise. Users can also vote answers based on their judgements. In this paper, we propose a Tri-Role Topic Model (TRTM) to model the tri-roles of users (i.e., as askers, answerers, and voters, respectively) and the activities of each role including composing question, selecting question to answer, contributing and voting answers. The proposed model can be used to enhance CQA systems from many perspectives. As a case study, we conducted experiments on ranking answers for questions on Stack Overflow, a CQA system for professional and enthusiast programmers. Experimental results show that TRTM is effective in facilitating users getting ideal rankings of answers, particularly for new and less popular questions. Evaluated on nDCG, TRTM outperforms state-of-the-art methods.


Learning Rank Functionals: An Empirical Study

arXiv.org Machine Learning

Ranking is a key aspect of many applications, such as information retrieval, question answering, ad placement and recommender systems. Learning to rank has the goal of estimating a ranking model automatically from training data. In practical settings, the task often reduces to estimating a rank functional of an object with respect to a query. In this paper, we investigate key issues in designing an effective learning to rank algorithm. These include data representation, the choice of rank functionals, the design of the loss function so that it is correlated with the rank metrics used in evaluation. For the loss function, we study three techniques: approximating the rank metric by a smooth function, decomposition of the loss into a weighted sum of element-wise losses and into a weighted sum of pairwise losses. We then present derivations of piecewise losses using the theory of high-order Markov chains and Markov random fields. In experiments, we evaluate these design aspects on two tasks: answer ranking in a Social Question Answering site, and Web Information Retrieval.


ON CLOSED WORLD DATA BASES / 119

AI Classics

ABSTRACT Deductive question-answering systems generally evaluate queries under one of two possible assumptions which we in this paper refer to as the open and closed world assumptions. The open world assumption corresponds to the usual first order approach to query evaluation: Given a data base DB and a query Q, the only answers to Q are those which obtain from proofs of Q given DB as hypotheses. Under the closed world assumption, certain answers are admitted as a result of failure to find a proof. More specifically, if no proof of a positive ground literal exists, then the negation of that literal is assumed true. In this paper, we show that closed world evaluation of an arbitrary query may be reduced to open world evaluation of socalled atomic queries. We then show that the closed world assumption can lead to inconsistencies, but for Horn data bases no such inconsistencies can arise. Finally, we show how for Horn data bases under the closed world assumption purely negative clauses are irrelevant for deductive retrieval and function instead as integrity constraints. INTRODUCTION Deductive question-answering systems generally evaluate queries under one of two possible assumptions which we in this paper refer to as the open and closed world assumptions.


11 Theorem-Proving by Resolution as a Basis for Question-Answering Systems Cordell Green

AI Classics

ABSTRACT This paper shows how a question-answering system can be constructed using first-order logic as its language and a resolution-type theorem-prover as its deductive mechanism. A working computer-program, Q A3, based on these ideas is described. The performance of the program compares favorably with several other general question-answering systems. The type of questionanswering system considered in this paper is ideally one having the following features: 1. A language general enough to describe any reasonable questionanswering subjects and express desired questions and answers. This paper argues the case for formal methods to achieve such a system and presents one particular approach in detail. The name'question-answering system' requires clarification. The system described above might be named an'advice taker' or a'multi-purpose problem-solving system' or'general problem-solving system'. McCarthy (1958) proposed using formal languages and deduction to construct such a system, and suggested allowing the user to give hints or advice on how to answer a question; he referred to the proposed system as an'advice taker'. Research on'multi-purpose' or'general problem-solving' tends to differ from questionanswering as described above by placing more emphasis on solving deeper, more difficult problems and less emphasis on user interaction, formality, and efficient retrieval of relevant facts from a large data base. The situation is further confused by the use of'question-answering' to refer sometimes to natural language systems, sometimes to information retrieval systems having little deductive ability, and sometimes to systems with deductive ability limited to the propositional calculus. It is important to emphasize the distinction between general versus specialpurpose question-answering. If the class of questions asked of a system is small, completely specified in advance, and concerned with a particular subject area, such as the question-answering system of Green, Wolf, Chomsky, and Laughery (1963) concerned with baseball, or that of Lindsay (1963) concerned with family relations, then we shall call such a system'specialpurpose'. Frequently the goal in designing a special-purpose system is to achieve good performance, measured in terms of running speed and memory utilization. In this case the best approach may be first to construct a special data base or memory that is optimized for that subject area and question class, and then to write special question-answering subroutines that are optimized for the particular data base and question class.


22 Question Answering BONNIE WEBBER AND NICK WEBB

AI Classics

Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. First, questions do not often translate into a simple list of keywords. For example, the question (1) Which countries did the pope visit in the 1960s? A much more complex set of keywords is needed in order to get anywhere close to the intended result, and experience shows that people will not learn how to formulate and use such sets. Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents. While this is not much of a burden when the answer appears in a snippet and further document access is unnecessary, QA technology aims to move this from being an accidental property of search to its focus. In keyword search and in much work to date on QA technology, the information seeking process has been seen as a one-shot affair: the user asks a question, and the system provides a satisfactory response. However, early work on QA (Section 1.1) did not make this assumption, and newly targeted applications are hindered by it: while a user may try to formulate a question whose answer is the information Question Answering 631 they want, they will not know whether they have succeeded until something has been returned for examination. If what is returned is unsatisfactory or, while not the answer, is still of interest, a user needs to be able to ask further questions that are understood in the context of the previous ones. For these target applications, QA must be part of a collaborative search process (Section 3.3). In the rest of this section, we give some historical background on QA systems (Section 1.1), on dialogue systems in which QA has played a significant role (Section 1.2), and on a particular QA task that has been a major driver of the field over the past 8 years (Section 1.3). Section 2 describes the current state of the art in QA systems, organized around the de facto architecture of such systems. Section 3 discusses some current directions in which QA is moving, including the development of interactive QA.


Learning Shuffle Ideals Under Restricted Distributions

Neural Information Processing Systems

The class of shuffle ideals is a fundamental sub-family of regular languages. The shuffle ideal generated by a string set $U$ is the collection of all strings containing some string $u \in U$ as a (not necessarily contiguous) subsequence. In spite of its apparent simplicity, the problem of learning a shuffle ideal from given data is known to be computationally intractable. In this paper, we study the PAC learnability of shuffle ideals and present positive results on this learning problem under element-wise independent and identical distributions and Markovian distributions in the statistical query model. A constrained generalization to learning shuffle ideals under product distributions is also provided. In the empirical direction, we propose a heuristic algorithm for learning shuffle ideals from given labeled strings under general unrestricted distributions. Experiments demonstrate the advantage for both efficiency and accuracy of our algorithm.


STEP: A Scalable Testing and Evaluation Platform

AAAI Conferences

The emergence of online crowdsourcing sites, online work platforms, and evenMassive Open Online Courses (MOOCs), has created an increasing need for reliably evaluating the skills of the participating users in a scalable way.Many platforms already allow users to take online tests and verify their skills, but the existing approaches face many problems. First of all, cheating is very common in online testing without supervision, as the test questions often "leak" and become easily available online together with the answers.Second, technical skills, such as programming, require the tests to be frequently updated in order to reflect the current state-of-the-art. Third,there is very limited evaluation of the tests themselves, and how effectively they measure the skill that the users are tested for. In this paper, we present a Scalable Testing and Evaluation Platform (STEP),that allows continuous generation and evaluation of test questions. STEP leverages already available content, on Question Answering sites such as StackOverflow and re-purposes these questions to generate tests. The system utilizes a crowdsourcing component for the editing of the questions, while it uses automated techniques for identifying promising QA threads that can be successfully re-purposed for testing. This continuous question generation decreases the impact of cheating and also creates questions that are closer to the real problems that the skill holder is expected to solve in real life.STEP also leverages the use of Item Response Theory to evaluate the quality of the questions. We also use external signals about the quality of the workers.These identify the questions that have the strongest predictive ability in distinguishing workers that have the potential to succeed in the online job marketplaces. Existing approaches contrast in using only internal consistency metrics to evaluate the questions. Finally, our system employs an automatic "leakage detector" that queries the Internet to identify leaked versions of our questions. We then mark these questions as "practice only," effectively removing them from the pool of questions used for evaluation. Our experimental evaluation shows that our system generates questions of comparable or higher quality compared to existing tests, with a cost of approximately 3-5 dollars per question, which is lower than the cost of licensing questions from existing test banks.