Niraula, Nobal Bikram
Combining Knowledge and Corpus-based Measures for Word-to-Word Similarity
Stefanescu, Dan (University of Memphis) | Rus, Vasile (University of Memphis) | Niraula, Nobal Bikram (University of Memphis) | Banjade, Rajendra (University of Memphis)
This paper shows that the combination of knowledge and corpus-based word-to-word similarity measures can produce higher agreement with human judgment than any of the in-dividual measures. While this might be a predictable result, the paper provides insights about the circumstances under which a combination is productive and about the improve-ment levels that are to be expected. The experiments presented here were conducted using the word-to-word similarity measures included in SEMILAR, a freely available semantic similarity toolkit.
A Study of Probabilistic and Algebraic Methods for Semantic Similarity
Rus, Vasile (The University of Memphis) | Niraula, Nobal Bikram (The University of Memphis) | Banjade, Rajendra (The University of Memphis)
We study and propose in this article several novel solutions to the task of semantic similarity between two short texts. The proposed solutions are based on the probabilistic method of Latent Dirichlet Allocation (LDA) and on the algebraic method of Latent Semantic Analysis (LSA). Both methods, LDA and LSA, are completely automated methods used to discover latent topics or concepts from large collection of documents. We propose a novel word-to-word similarity measure based on LDA as well as several text-to-text similarity measures. We compare these measures with similar, known measures based on LSA. Experiments and results are presented on two data sets: the Microsoft Research Paraphrase corpus and the User Language Paraphrase corpus. We found that the novel word-to-word similarity measure based on LDA is extremely promising.