Answering natural language questions against structured knowledge bases (KB) has been attracting increasing attention in both IR and NLP communities. The task involves two main challenges: recognizing the questions' meanings, which are then grounded to a given KB. Targeting simple factoid questions, many existing open domain semantic parsers jointly solve these two subtasks, but are usually expensive in complexity and resources.In this paper, we propose a simple pipeline framework to efficiently answer more complicated questions, especially those implying aggregation operations, e.g., argmax, argmin.We first develop a transition-based parsing model to recognize the KB-independent meaning representation of the user's intention inherent in the question. Secondly, we apply a probabilistic model to map the meaning representation, including those aggregation functions, to a structured query.The experimental results showed that our method can better understand aggregation questions, outperforming the state-of-the-art methods on the Free917 dataset while still maintaining promising performance on a more challenging dataset, WebQuestions, without extra training.
We propose a new pooling technique for topic modeling in Twitter, which groups together tweets occurring in the same user-to-user conversation. Under this scheme, tweets and their replies are aggregated into a single document and the users who posted them are considered co-authors. To compare this new scheme against existing ones, we train topic models using Latent Dirichlet Allocation (LDA) and the Author-Topic Model (ATM) on datasets consisting of tweets pooled according to the different methods. Using the underlying categories of the tweets in this dataset as a noisy ground truth, we show that this new technique outperforms other pooling methods in terms of clustering quality and document retrieval.
Foster, Jennifer (Dublin City University) | Cetinoglu, Ozlem (Dublin City University) | Wagner, Joachim (Dublin City University) | Roux, Joseph Le (LIF - CNRS) | Hogan, Stephen (Dublin City University) | Nivre, Joakim (Uppsala University) | Hogan, Deirdre (Dublin City University) | Genabith, Josef van (Dublin City University)
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement.