Goto

Collaborating Authors

 Question Answering


Conjunctive Query Answering for the Description Logic SHIQ

arXiv.org Artificial Intelligence

Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base. We show decidability of query answering in this setting and establish two tight complexity bounds: regarding combined complexity, we prove that there is a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query, which is optimal. Regarding data complexity, we prove containment in co-NP.


Consistent Query Answering via ASP from Different Perspectives: Theory and Practice

arXiv.org Artificial Intelligence

A data integration system provides transparent access to different data sources by suitably combining their data, and providing the user with a unified view of them, called global schema. However, source data are generally not under the control of the data integration process, thus integrated data may violate global integrity constraints even in presence of locally-consistent data sources. In this scenario, it may be anyway interesting to retrieve as much consistent information as possible. The process of answering user queries under global constraint violations is called consistent query answering (CQA). Several notions of CQA have been proposed, e.g., depending on whether integrated information is assumed to be sound, complete, exact or a variant of them. This paper provides a contribution in this setting: it uniforms solutions coming from different perspectives under a common ASP-based core, and provides query-driven optimizations designed for isolating and eliminating inefficiencies of the general approach for computing consistent answers. Moreover, the paper introduces some new theoretical results enriching existing knowledge on decidability and complexity of the considered problems. The effectiveness of the approach is evidenced by experimental results. To appear in Theory and Practice of Logic Programming (TPLP).


The Complexity of Causality and Responsibility for Query Answers and non-Answers

arXiv.org Artificial Intelligence

An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Chockler's recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. Responsibility captures the notion of degree of causality and serves to rank potentially many causes by their relative contributions to the effect. Then, we study the complexity of computing causes and responsibilities for conjunctive queries. It is known that computing causes is NP-complete in general. Our first main result shows that all causes to conjunctive queries can be computed by a relational query which may involve negation. Thus, causality can be computed in PTIME, and very efficiently so. Next, we study computing responsibility. Here, we prove that the complexity depends on the conjunctive query and demonstrate a dichotomy between PTIME and NP-complete cases. For the PTIME cases, we give a non-trivial algorithm, consisting of a reduction to the max-flow computation problem. Finally, we prove that, even when it is in PTIME, responsibility is complete for LOGSPACE, implying that, unlike causality, it cannot be computed by a relational query.


Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services

AAAI Conferences

This paper focuses on analyzing and predicting not-answered questions in Community based Question Answering (CQA) services, such as Yahoo! Answers. In CQA services, users express their information needs by submitting natural language questions and await answers from other human users. Comparing to receiving results from web search engines using keyword queries, CQA users are likely to get more specific answers, because human answerers may catch the main point of the question. However, one of the key problems of this pattern is that sometimes no one helps to give answers, while web search engines hardly fail to response. In this paper, we analyze the not-answered questions and give a first try of predicting whether questions will receive answers. More specifically, we first analyze the questions of Yahoo Answers based on the features selected from different perspectives. Then, we formalize the prediction problem as supervised learning – binary classification problem and leverage the proposed features to make predictions. Extensive experiments are made on 76,251 questions collected from Yahoo! Answers. We analyze the specific characteristics of not-answered questions and try to suggest possible reasons for why a question is not likely to be answered. As for prediction, the experimental results show that classification based on the proposed features outperforms the simple word-based approach significantly.


Leveraging Wikipedia Characteristics for Search and Candidate Generation in Question Answering

AAAI Conferences

Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high overall QA performance in Watson, our end-to-end system.


A Natural Language Question Answering System as a Participant in Human Q&A Portals

AAAI Conferences

LogAnswer is a question answering (QA) system for the German language, aimed at providing concise and correct answers to arbitrary questions. For this purpose LogAnswer is designed as an embedded artificial intelligence system which integrates methods from several fields of AI, namely natural language processing, machine learning, knowledge representation and automated theorem proving. We intend to employ LogAnswer as a virtual user within Internet-based QA forums, where it must be able to identify the questions that it cannot answer correctly, a task that normally receives little attention in QA research compared to the actual answer derivation. The paper presents a machine learning solution to the wrong answer avoidance (WAA) problem, applying a meta classifier to the output of simple term-based classifiers and a rich set of other WAA features. Experiments with a large set of real-world questions from a QA forum show that the proposed method significantly improves the WAA characteristics of our system.


What to Ask to an Incomplete Semantic Web Reasoner?

AAAI Conferences

Largely motivated by Semantic Web applications, many highly scalable, but incomplete, query answering systems have been recently developed. Evaluating the scalability-completeness trade-off exhibited by such systems is an important requirement for many applications. In this paper, we address the problem of formally comparing complete and incomplete systems given an ontology schema (or TBox) T. We formulate precise conditions on TBoxes T expressed in the EL, QL or RL profile of OWL 2 under which an incomplete system is indistinguishable from a complete one w.r.t. T, regardless of the input query and data. Our results also allow us to quantify the "degree of incompleteness" of a given system w.r.t. T as well as to automatically identify concrete queries and data patterns for which the incomplete system will miss answers.


Interfacing Virtual Agents With Collaborative Knowledge: Open Domain Question Answering Using Wikipedia-Based Topic Models

AAAI Conferences

This paper is concerned with the use of conversational agents as an interaction paradigm for accessing open domain encyclopedic knowledge by means of Wikipedia. More precisely, we describe a dialogue-based question answering system for German which utilizes Wikipedia-based topic models as a reference point for context detection and answer prediction. We investigate two different per- spectives to the task of interfacing virtual agents with collaborative knowledge. First, we exploit the use of Wikipedia categories as a basis for identifying the broader topic of a spoken utterance. Second, we describe how to enhance the conversational behavior of the virtual agent by means of a Wikipedia-based question answering component which incorporates the question topic. At large, our approach identifies topic-related focus terms of a user’s question, which are subsequently mapped onto a category taxonomy. Thus, we utilize the taxonomy as a reference point to derive topic labels for a user’s question. The employed topic model is thereby based on explicitly given concepts as represented by the document and category structure of the Wikipedia knowledge base. Identified topic categories are subsequently combined with different linguistic filtering methods to improve answer candidate retrieval and reranking. Results show that the topic model approach contributes to an enhancement of the conversational behavior of virtual agents.


An Approach to Answer Selection in Question-Answering Based on Semantic Relations

AAAI Conferences

A usual strategy to select the final answer in factoid Question-Answering (QA) relies on redundancy. A score is given to each candidate answer as a function of its frequency of occurrence, and the final answer is selected from the set of candidates sorted in decreasing order of score. For that purpose, systems often try to group together semantically equivalent answers. However, they hold several other semantic relations, such as inclusion, which are not considered, and candidates are mostly seen independently, as competitors. Our hypothesis is that not just equivalence, but other relations between candidate answers have impact on the performance of a redundancy-based QA system. In this paper, we describe experimental studies to back up this hypothesis. Our findings show that, with relatively simple techniques to recognize relations, systems' accuracy can be improved for answers of categories Number, Date and Entity.


Query Answering in the Horn Fragments of the Description Logics SHOIQ and SROIQ

AAAI Conferences

As more and more application areas require higher scalability, the study of fragments of expressive The high computational complexity of the expressive DLs with better computational properties has become an Description Logics (DLs) that underlie the important area of research. OWL standard has motivated the study of their Horn fragments of DLs, which are obtained by restricting Horn fragments, which are usually tractable in data the syntax of a DL in such a way that disjunction is not complexity and can also have lower combined complexity, expressible, were first considered in [Hustadt et al., 2005] particularly for query answering. In this paper as expressive fragments with tractable data complexity (see we provide algorithms for answering conjunctive also [Krötzsch et al., 2007]). It was later identified that they 2-way regular path queries (2CRPQs), a nontrivial can also exhibit lower combined complexity when it comes generalization of plain conjunctive queries, to query answering. Indeed, answering conjunctive queries in the Horn fragments of the DLs SHOIQ and (CQs), a kind of database-inspired queries that have become SROIQ underlying OWL 1 and OWL 2. We show the standard for querying DLs, is in ExpTime for the Horn that the combined complexity of the problem is ExpTime-complete fragment of the prominent SHIQ [Eiter et al., 2008], while it for Horn-SHOIQ and 2ExpTimecomplete is already 2ExpTime-hard for quite restricted (non Horn) fragments for the more expressive Horn-SROIQ, of SHIQ, like ALCI [Lutz, 2008] and SH [Eiter et but is PTime-complete in data complexity for both.