correlation explanation
Discovering Structure in High-Dimensional Data Through Correlation Explanation
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
Discovering Structure in High-Dimensional Data Through Correlation Explanation
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
- Africa (0.06)
- Oceania (0.04)
- North America > United States > California > Monterey County > Marina (0.04)
- (5 more...)
Discovering Structure in High-Dimensional Data Through Correlation Explanation
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
- Africa (0.06)
- Oceania (0.04)
- North America > United States > California > Monterey County > Marina (0.04)
- (5 more...)
Discovering Structure in High-Dimensional Data Through Correlation Explanation
Steeg, Greg Ver, Galstyan, Aram
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language. Papers published at the Neural Information Processing Systems Conference.
Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge
Gallagher, Ryan J., Reing, Kyle, Kale, David, Steeg, Greg Ver
While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.
- North America > United States > California (0.28)
- North America > United States > Missouri (0.14)
Toward Interpretable Topic Discovery via Anchored Correlation Explanation
Reing, Kyle, Kale, David C., Steeg, Greg Ver, Galstyan, Aram
Many predictive tasks, such as diagnosing a patient based on their medical chart, are ultimately defined by the decisions of human experts. Unfortunately, encoding experts' knowledge is often time consuming and expensive. We propose a simple way to use fuzzy and informal knowledge from experts to guide discovery of interpretable latent topics in text. The underlying intuition of our approach is that latent factors should be informative about both correlations in the data and a set of relevance variables specified by an expert. Mathematically, this approach is a combination of the information bottleneck and Total Correlation Explanation (CorEx). We give a preliminary evaluation of Anchored CorEx, showing that it produces more coherent and interpretable topics on two distinct corpora.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Biomedical Informatics (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.34)
Discovering Structure in High-Dimensional Data Through Correlation Explanation
Steeg, Greg Ver, Galstyan, Aram
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
- Africa (0.06)
- Oceania (0.04)
- North America > United States > California > Monterey County > Marina (0.04)
- (5 more...)
Discovering Structure in High-Dimensional Data Through Correlation Explanation
Steeg, Greg Ver, Galstyan, Aram
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
- Africa (0.05)
- Oceania (0.04)
- North America > United States > California > Monterey County > Marina (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)