Topic Modeling with Wasserstein Autoencoders
Nan, Feng, Ding, Ran, Nallapati, Ramesh, Xiang, Bing
–arXiv.org Artificial Intelligence
We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.
arXiv.org Artificial Intelligence
Jul-24-2019
- Country:
- Africa (1.00)
- Asia
- Middle East
- Iran (0.46)
- Iraq (0.67)
- Israel (0.68)
- Palestine > Gaza Strip (0.14)
- Russia (0.67)
- Middle East
- Europe
- Germany (0.67)
- United Kingdom (1.00)
- North America
- Canada (1.00)
- United States > Missouri
- Jackson County (0.14)
- Oceania (1.00)
- South America (0.67)
- Genre:
- Research Report (1.00)
- Industry:
- Media
- Film (1.00)
- Music (1.00)
- Television (1.00)
- Retail (0.92)
- Transportation
- Air (1.00)
- Ground (1.00)
- Infrastructure & Services (0.92)
- Passenger (0.67)
- Automobiles & Trucks (1.00)
- Banking & Finance (1.00)
- Government
- Foreign Policy (0.92)
- Military > Navy (1.00)
- Regional Government
- Asia Government > Middle East Government (1.00)
- Europe Government (1.00)
- North America Government > United States Government (1.00)
- Voting & Elections (1.00)
- Health & Medicine
- Energy
- Oil & Gas (1.00)
- Power Industry (0.92)
- Information Technology
- Security & Privacy (1.00)
- Software (0.67)
- Education > Educational Setting
- Higher Education (0.67)
- Leisure & Entertainment
- Games > Computer Games (0.92)
- Sports
- Law > Government & the Courts (1.00)
- Consumer Products & Services
- Food, Beverage, Tobacco & Cannabis > Beverages (0.67)
- Restaurants (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)
- Media
- Technology: