s-wmd
Supervised Word Mover's Distance
Gao Huang, Chuan Guo, Matt J. Kusner, Yu Sun, Fei Sha, Kilian Q. Weinberger
Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on k NN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
Supervised Word Mover's Distance
Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Wasserstein-Fisher-Rao Document Distance
Wang, Zihao, Zhou, Datong, Zhang, Yong, Wu, Hao, Bao, Chenglong
As a fundamental problem of natural language processing, it is important to measure the distance between different documents. Among the existing methods, the Word Mover's Distance (WMD) has shown remarkable success in document semantic matching for its clear physical insight as a parameter-free model. However, WMD is essentially based on the classical Wasserstein metric, thus it often fails to robustly represent the semantic similarity between texts of different lengths. In this paper, we apply the newly developed Wasserstein-Fisher-Rao (WFR) metric from unbalanced optimal transport theory to measure the distance between different documents. The proposed WFR document distance maintains the great interpretability and simplicity as WMD. We demonstrate that the WFR document distance has significant advantages when comparing the texts of different lengths. In addition, an accelerated Sinkhorn based algorithm with GPU implementation has been developed for the fast computation of WFR distances. The KNN classification results on eight datasets have shown its clear improvement over WMD.
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Supervised Word Mover's Distance
Huang, Gao, Guo, Chuan, Kusner, Matt J., Sun, Yu, Sha, Fei, Weinberger, Kilian Q.
Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)