Are Neural Topic Models Broken?

Hoyle, Alexander, Goel, Pranav, Sarkar, Rupak, Resnik, Philip

Oct-28-2022–arXiv.org Artificial Intelligence

Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use. Motivated by content analysis as a dominant real-world use case for topic modeling, we analyze two related aspects of topic models that affect their effectiveness and trustworthiness in practice for that purpose: the stability of their estimates and the extent to which the model's discovered categories align with human-determined categories in the data. We find that neural topic models fare worse in both respects compared to an established classical method. We take a step toward addressing both issues in tandem by demonstrating that a straightforward ensembling method can reliably outperform the members of the ensemble.

machine learning, natural language, topic model, (16 more...)

arXiv.org Artificial Intelligence

Oct-28-2022

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Maryland (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
    - Colorado > Denver County
      - Denver (0.04)
    - California > Los Angeles County
      - Los Angeles (0.14)
  - Canada
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
    - Alberta > Census Division No. 15
      - Improvement District No. 9 > Banff (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Sweden > Vaestra Goetaland
    - Gothenburg (0.04)
  - Spain > Andalusia
    - Granada Province (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Japan (0.04)
  - China > Hong Kong (0.04)
  - Bangladesh (0.04)

Genre:
- Research Report
  - New Finding (0.68)
  - Experimental Study (0.46)

Industry:
- Information Technology (0.67)
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (1.00)
  - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Discourse & Dialogue (1.00)
  - Machine Learning > Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found