Towards Generalising Neural Topical Representations

Yang, Xiaohao, Zhao, He, Phung, Dinh, Du, Lan

Jul-24-2023–arXiv.org Artificial Intelligence

Topic models have evolved from conventional Bayesian probabilistic models to Neural Topic Models (NTMs) over the last two decays. Although NTMs have achieved promising performance when trained and tested on a specific corpus, their generalisation ability across corpora is rarely studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation for documents in a different target corpus without retraining. In this work, we aim to improve NTMs further so that their benefits generalise reliably across corpora and tasks. To do so, we propose to model similar documents by minimising their semantical distance when training NTMs. Specifically, similar documents are created by data augmentation during training; The semantical distance between documents is measured by the Hierarchical Topic Transport Distance (HOTT), which computes the Optimal Transport (OT) distance between the topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-24-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- South America > Ecuador (0.04)
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Europe
  - Poland (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Henan Province
    - Zhengzhou (0.04)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found