Using Syntactic Features in Answer Reranking
Grundström, Jakob (Lund University) | Nugues, Pierre (Lund University)
This paper describes a baseline question answering system for Swedish on which we measured the contribution brought by syntactic features. The system includes modules to carry out the question analysis, hypothesis generation, and reranking of answers. It was trained and evaluated on questions from a data set inspired by Swedish television quiz show Kvitt eller Dubbelt -- Tiotusenkronorsfrågan. We used a HTML dump of the Swedish version of Wikipedia as knowledge source and we show in this paper that paragraph retrieval from this corpus gives an acceptable coverage of answers when targeting Kvitt eller Dubbelt questions, especially single-word answer questions. Given a question, the hypothesis generation module retrieves a list of paragraphs, ranks them using a vector space model score, and extract a set of candidates. The question analysis part performs a lexical answer type prediction. To compute a baseline ranking, we sorted answer candidates according to their frequencies in the most relevant paragraphs. The reranker module makes use of information from the previous stages to estimate the correctness of the generated answer candidates as well a grammatical information from a dependency parser. The correctness estimate is then used to re-weight the baseline ranking. A 5-fold cross-validation showed that the median ranking of the correct candidate went from rank 21 in the baseline version to 10 using the reranker.
Jul-22-2014
- Technology: