Complexity Guided Noise Filtering in QA Repositories
Dileep, K. V. S. (Indian Institute of Technology) | Hingmire, Swapnil (Tata Research Development and Design Centre) | Chakraborti, Sutanu (Indian Institute of Technology, Madras)
Filtering out noisy sentences of an answer which are irrelevant to the question being asked increases the utility and reuse of a Question-Answer (QA) repository. Filtering such sentences might be difficult for traditional supervised classification methods due to the extensive labelling efforts involved. In this paper, we propose a semi-supervised learning approach, where we first infer a set of topics on the corpus using Latent Dirichlet Allocation (LDA). We label the topics automatically using a small labelled set and use them for classifying an unseen sentence as useful or noisy. We performed the experiments on a real-life help desk dataset and find that the results are comparable to other methods in semi-supervised learning.
May-16-2017
- Technology: