Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective

Li, Yitan (University of Science and Technology of China) | Xu, Linli (University of Science and Technology of China) | Tian, Fei (University of Science and Technology of China) | Jiang, Liang (University of Science and Technology of China) | Zhong, Xiaowei (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China)

Jul-15-2015–AAAI Conferences

Recently significant advances have been witnessed in the area of distributed word representations based on neural networks, which are also known as word embeddings. Among the new word embedding models, skip-gram negative sampling (SGNS) in the word2vec toolbox has attracted much attention due to its simplicity and effectiveness. However, the principles of SGNS remain not well understood, except for a recent work that explains SGNS as an implicit matrix factorization of the pointwise mutual information (PMI) matrix. In this paper, we provide a new perspective for further understanding SGNS. We point out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word. Based on the representation learning view, SGNS is in fact an explicit matrix factorization (EMF) of the words’ co-occurrence matrix. Furthermore, extended supervised word embedding can be established based on our proposed representation learning view.

matrix factorization, representation, sgn, (14 more...)

AAAI Conferences

Jul-15-2015

Conferences PDF

Add feedback

Country:
- North America > Canada
  - Quebec > Montreal (0.04)
- Europe
  - Spain > Galicia
    - Madrid (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia > China
  - Beijing > Beijing (0.04)
  - Anhui Province > Hefei (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.55)
  - Statistical Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found