Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

Gallagher, Ryan J., Reing, Kyle, Kale, David, Steeg, Greg Ver

Dec-3-2017–arXiv.org Machine Learning

While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.

corex, nephrology, vascular disease, (31 more...)

arXiv.org Machine Learning

Dec-3-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.28)
  - Missouri (0.14)

Genre:
- Research Report (0.82)

Industry:
- Government (1.00)
- Energy > Oil & Gas (0.97)
- Water & Waste Management > Water Management
  - Water Supplies & Services (0.93)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area
    - Infections and Infectious Diseases (1.00)
    - Gastroenterology (1.00)
    - Cardiology/Vascular Diseases (1.00)
    - Immunology (1.00)
    - Musculoskeletal (1.00)
    - Neurology (1.00)
    - Nephrology (1.00)
    - Pulmonary/Respiratory Diseases (1.00)
    - Endocrinology (1.00)
    - Psychiatry/Psychology (1.00)
    - Hematology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (0.68)
  - Natural Language
    - Text Processing (0.88)
    - Discourse & Dialogue (0.53)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found