Geometry of Polysemy
Mu, Jiaqi, Bhat, Suma, Viswanath, Pramod
Vector representations of words have heralded a transformational approach to classical problems in NLP; the most popular example is word2vec. However, a single vector does not suffice to model the polysemous nature of many (frequent) words, i.e., words with multiple meanings. In this paper, we propose a three-fold approach for unsupervised polysemy modeling: (a) context representations, (b) sense induction and disambiguation and (c) lexeme (as a word and sense pair) representations. A key feature of our work is the finding that a sentence containing a target word is well represented by a low rank subspace, instead of a point in a vector space. We then show that the subspaces associated with a particular sense of the target word tend to intersect over a line (one-dimensional subspace), which we use to disambiguate senses using a clustering algorithm that harnesses the Grassmannian geometry of the representations. The disambiguation algorithm, which we call $K$-Grassmeans, leads to a procedure to label the different senses of the target word in the corpus -- yielding lexeme vector representations, all in an unsupervised manner starting from a large (Wikipedia) corpus in English. Apart from several prototypical target (word,sense) examples and a host of empirical studies to intuit and justify the various geometric representations, we validate our algorithms on standard sense induction and disambiguation datasets and present new state-of-the-art results.
Oct-24-2016
- Country:
- Africa > South Africa (0.04)
- Asia
- China > Beijing
- Beijing (0.04)
- Japan > Honshū
- Kantō > Chiba Prefecture > Chiba (0.04)
- Middle East
- Singapore (0.04)
- South Korea (0.04)
- China > Beijing
- Europe
- Czechia > Prague (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- France > Hauts-de-France
- Germany > Berlin (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Hampshire > Southampton (0.04)
- North America
- Canada
- British Columbia (0.04)
- Quebec > Montreal (0.04)
- United States
- Colorado > Boulder County
- Boulder (0.04)
- Oregon > Benton County
- Corvallis (0.04)
- California > Los Angeles County
- Los Angeles (0.14)
- District of Columbia (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York (0.04)
- Maryland > Prince George's County
- College Park (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- Washington > Cowlitz County (0.04)
- Colorado > Boulder County
- Canada
- Pacific Ocean > North Pacific Ocean
- South China Sea (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Education (0.46)
- Government (0.46)
- Transportation > Air (0.46)
- Technology: