corex
Discovering Structure in High-Dimensional Data Through Correlation Explanation
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
- Africa (0.06)
- Oceania (0.04)
- North America > United States > California > Monterey County > Marina (0.04)
- (5 more...)
Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter Optimisation
Song, Junhao, Yuan, Yingfang, Chang, Kaiwen, Xu, Bing, Xuan, Jin, Pang, Wei
To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned across three distinct strata of the public: the general public, professionals, and official sources. Subsequently, we utilised three topic models on the collected data. Topic modelling represents a type of data-driven and machine learning approach for text mining, capable of automatically categorising a large number of documents into distinct semantic groups. Simultaneously, these groups are described by topics, and these topics can aid in understanding the semantic content of documents at a high level. However, the performance of topic modelling may vary depending on different hyperparameter values. Therefore, in this study, we proposed a framework for topic modelling with hyperparameter optimisation for CE and conducted a series of systematic experiments to ensure that topic models are set with appropriate hyperparameters and to gain insights into the correlations between the CE and public opinion based on well-established models. The results of this study indicate that concerns about sustainability and economic impact persist across all three datasets. Official sources demonstrate a higher level of engagement with the application and regulation of CE. To the best of our knowledge, this study is pioneering in investigating various levels of public opinions concerning CE through topic modelling with the exploration of hyperparameter optimisation.
- Oceania > Australia (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > Russia (0.04)
- (13 more...)
- Water & Waste Management > Solid Waste Management (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government (1.00)
- (6 more...)
Discovering Structure in High-Dimensional Data Through Correlation Explanation
We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.
- Africa (0.06)
- Oceania (0.04)
- North America > United States > California > Monterey County > Marina (0.04)
- (5 more...)
Exploring higher-order neural network node interactions with total correlation
Kerby, Thomas, White, Teresa, Moon, Kevin
All of these methods require either an input of and the human brain the variables interact interest or the class labels and are thus supervised. in complex ways. Yet accurately characterizing higher-order variable interactions (HOIs) is a difficult In response to the challenges posed by understanding neural problem that is further exacerbated when the networks and analyzing higher-order variable interactions HOIs change across the data. To solve this problem (HOIs), we present Local CorEx, a novel post hoc method we propose a new method called Local Correlation suitable for exploring model weights, nodes, subnetworks, Explanation (CorEx) to capture HOIs at a and latent representations in an unsupervised manner. Here local scale by first clustering data points based on we focus our attention on analyzing groups of hidden nodes their proximity on the data manifold. We then use and latent representations. To the best of our knowledge, our a multivariate version of the mutual information work marks the first post hoc method to do so in an unsupervised called the total correlation, to construct a latent manner and includes the option to easily incorporate factor representation of the data within each cluster label information. Additionally, our approach extends to to learn the local HOIs. We use Local CorEx analyzing HOIs within the data.
- North America > United States > Utah (0.14)
- North America > United States > Massachusetts (0.14)
- Energy > Oil & Gas (1.00)
- Health & Medicine (0.93)
BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification
Liu, Ziwen, Grau-Bove, Josep, Orr, Scott Allan
Multi-label Text Classification (MLTC) is the task of categorizing documents into one or more topics. Considering the large volumes of data and varying domains of such tasks, fully supervised learning requires manually fully annotated datasets which is costly and time-consuming. In this paper, we propose BERT-Flow-VAE (BFV), a Weakly-Supervised Multi-Label Text Classification (WSMLTC) model that reduces the need for full supervision. This new model (1) produces BERT sentence embeddings and calibrates them using a flow model, (2) generates an initial topic-document matrix by averaging results of a seeded sparse topic model and a textual entailment model which only require surface name of topics and 4-6 seed words per topic, and (3) adopts a VAE framework to reconstruct the embeddings under the guidance of the topic-document matrix. Finally, (4) it uses the means produced by the encoder model in the VAE architecture as predictions for MLTC. Experimental results on 6 multi-label datasets show that BFV can substantially outperform other baseline WSMLTC models in key metrics and achieve approximately 84% performance of a fully-supervised model.
Machine learning helps to identify early signs of Alzheimer's
Researchers at the University of Southern California have discovered "hidden" indicators of Alzheimer's in medical data that could result in earlier diagnosis of the disease and better prognosis for patients. Using machine learning, USC researchers identified potential blood-based markers of Alzheimer's disease that could be detected with a routine blood test. "This type of analysis is a novel way of discovering patterns of data to identify key diagnostic markers of disease," said Paul Thompson, associate director of the USC Mark and Mary Stevens Neuroimaging and Informatics Institute and professor in USC's Keck School of Medicine. "In a very large database of health measures, it helped us discover predictive features of Alzheimer's disease that nobody suspected were there." Also See: MRI brain scans better ID people likely to develop Alzheimer's In their study, published in Frontiers in Aging Neuroscience, the USC research team analyzed medical data in the Alzheimer's Disease Neuroimaging Initiative database--collected from 829 older adults--to identify predictors of cognitive decline and brain atrophy during a one-year period.
Auto-Encoding Total Correlation Explanation
Gao, Shuyang, Brekelmans, Rob, Steeg, Greg Ver, Galstyan, Aram
Advances in unsupervised learning enable reconstruction and generation of samples from complex distributions, but this success is marred by the inscrutability of the representations learned. We propose an information-theoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation. The principle of total Cor-relation Ex-planation (CorEx) has motivated successful unsupervised learning applications across a variety of domains, but under some restrictive assumptions. Here we relax those restrictions by introducing a flexible variational lower bound to CorEx. Surprisingly, we find that this lower bound is equivalent to the one in variational autoencoders (VAE) under certain conditions. This information-theoretic view of VAE deepens our understanding of hierarchical VAE and motivates a new algorithm, AnchorVAE, that makes latent codes more interpretable through information maximization and enables generation of richer and more realistic samples.
Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge
Gallagher, Ryan J., Reing, Kyle, Kale, David, Steeg, Greg Ver
While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.
- North America > United States > California (0.28)
- North America > United States > Missouri (0.14)
Unsupervised Learning via Total Correlation Explanation
Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.
- North America > United States > California (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
Machine learning reveals correlations of gene expression in RNA-Seq data
Shirley Pepke – The complexity of cancer has famously eluded conquering by modern medicine. Every tumor has many aberrations that drive its growth. As a result, treatments that target single vulnerabilities are typically of short-lived efficacy. After being diagnosed with advanced stage ovarian cancer in 2013, I wagered that what was needed was an algorithm capable of digesting and analyzing the complexity to provide a detailed view into the multitude of factors at work in a given tumor. To pursue this goal, I began a collaboration with Greg Ver Steeg, who specializes in analyzing big data, to bring state-of-the-art machine learning to bear on the recently released large-scale data from the Cancer Genome Atlas (TCGA).