Ravi, Sujith
Refer-to-as Relations as Semantic Knowledge
Feng, Song (IBM T.J. Watson Research Center / Stony Brook University) | Ravi, Sujith (Google) | Kumar, Ravi (Google) | Kuznetsova, Polina (Stony Brook University) | Liu, Wei (University of North Carolina at Chapel Hill) | Berg, Alexander C. (University of North Carolina at Chapel Hill) | Berg, Tamara L. (University of North Carolina at Chapel Hill) | Choi, Yejin (University of Washington)
We study Refer-to-as relations as a new type of semanticknowledge. Compared to the much studied Is-a relation,which concerns factual taxonomy knowledge, Refer-to-as relationsaim to address pragmatic semantic knowledge. Forexample, a “penguin” is a “bird” from a taxonomy point ofview, but people rarely refer to a “penguin” as a “bird” invernacular use. This observation closely relates to the entrylevelcategorization studied in Prototype Theory in Psychology.We posit that Refer-to-as relations can be learned fromdata, and that both textual and visual information would behelpful in inferring the relations. By integrating existing lexicalstructure knowledge with language statistics and visualsimilarities, we formulate a collective inference approach tomap all object names in an encyclopedia to commonly usednames for each object. Our contributions include a new labeleddata set, the inference and optimization approach, andthe computed mappings and similarities.
Great Question! Question Quality in Community Q&A
Ravi, Sujith (Google Inc.) | Pang, Bo (Google Inc.) | Rastogi, Vibhor (Twitter) | Kumar, Ravi (Google Inc.)
Asking the right question in the right way is an art (and a science). In a community question-answering setting, a good question is not just one that is found to be useful by other people: a question is good if it is also presented clearly and shows prior research. Using a community question-answering site that allows voting over the questions, we show that there is a notion of question quality that goes beyond mere popularity. We present techniques using latent topic models to automatically predict the quality of questions based on their content. Our best system achieves a prediction accuracy of 72%, beating out strong baselines by a significant amount. We also examine the effect of question quality on the dynamics of user behavior and the longevity of questions.
FastEx: Hash Clustering with Exponential Families
Ahmed, Amr, Ravi, Sujith, Smola, Alex J., Narayanamurthy, Shravan M.
Clustering is a key component in data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as $k$-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters.