This paper presents a hybrid method for finding answers to Definition questions within large text collections. Because candidate answers to Definition questions do not generally fall in clearly defined semantic categories, answer discovery is guided by a combination of pattern matching and WordNet-based question expansion. The method is incorporated in a large opendomain question answering system and validated by extracting answers to 188 questions from a standard 3-Gigabyte text collection and from Web documents.
Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high overall QA performance in Watson, our end-to-end system.
Finding definitions in huge text collections is a challenging problem, not only because of the many ways in which definitions can be conveyed in natural language texts but also because the definiendum (i.e., the thing to be defined) has not, on its own, enough discriminative power to allow selection of definition-bearing passages from the collection. We have developed a method that uses already available external sources to gather knowledge about the "definiendum" before trying to define it using the given text collection. This knowledge consists of lists of relevant secondary terms that frequently cooccur with the definiendum in definition-bearing passages or "definiens". External sources used to gather secondary terms are an online enyclopedia, a lexical database and the Web. These secondary terms together with the definiendum are used to select passages from the text collection performing information retrieval. Further linguistic analysis is carried out on each passage to extract definition strings from the passages using a number of criteria including the presence of main and secondary terms or definition patterns.
Wang, Shuohang (Singapore Management University) | Yu, Mo (IBM Research AI) | Guo, Xiaoxiao (IBM Research AI) | Wang, Zhiguo (IBM Research AI) | Klinger, Tim (IBM Research AI) | Zhang, Wei (IBM Research AI) | Chang, Shiyu (IBM Research AI) | Tesauro, Gerry (IBM Research AI) | Zhou, Bowen (JD.COM) | Jiang, Jing (Singapore Management University)
In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al. 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al. 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that “reads” the passages to generate an answer to the question. Performance in this setting lags well behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader (R 3 ), based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.
In question answering, answer extraction aims topin-point the exact answer from passages. However,most previous methods perform such extractionon each passage separately, without consideringclues provided in other passages. This paperpresents a novel approach to extract answers byfully leveraging connections among different passages.Specially, extraction is performed on a PassageGraph which is built by adding links uponmultiple passages. Different passages are connectedby linking words with the same stem. Weuse the factor graph as our model for answer extraction.Experimental results on multiple QA datasets demonstrate that our method significantly improvesthe performance of answer extraction.