optimal transport
Distilled Wasserstein Learning for Word Embedding and Topic Modeling
We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transport to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled ground-distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving algorithm convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinically-meaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.
- North America > United States > Virginia (0.04)
- North America > Canada (0.04)
- Europe > Russia (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Media (0.68)
- Government (0.68)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- Europe > Portugal (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (3 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia (0.04)
- Europe > France > Normandy > Seine-Maritime > Rouen (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Mathematics of Computing (0.68)
- Information Technology > Data Science (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Switzerland (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)