complete rep resentation
Wasserstein Dependency Measure for Representation Learning
Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning. However, such approaches are fundamentally limited since a tight lower bound on mutual information requires sample size exponential in the mutual information. This limits the applicability of these approaches for prediction tasks with high mutual information, such as in video understanding or reinforcement learning. In these settings, such techniques are prone to overfit, both in theory and in practice, and capture only a few of the relevant factors of variation. This leads to incomplete representations that are not optimal for downstream tasks. In this work, we empirically demonstrate that mutual information-based representation learning approaches do fail to learn complete representations on a number of designed and real-world tasks. To mitigate these problems we introduce the Wasserstein dependency measure, which learns more complete representations by using the Wasserstein distance instead of the KL divergence in the mutual information estimator. We show that a practical approximation to this theoretically motivated solution, constructed using Lipschitz constraint techniques from the GAN literature, achieves substantially improved results on tasks where incomplete representations are a major challenge.
Wasserstein Dependency Measure for Representation Learning
Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning. However, such approaches are fundamentally limited since a tight lower bound on mutual information requires sample size exponential in the mutual information. This limits the applicability of these approaches for prediction tasks with high mutual information, such as in video understanding or reinforcement learning. In these settings, such techniques are prone to overfit, both in theory and in practice, and capture only a few of the relevant factors of variation. This leads to incomplete representations that are not optimal for downstream tasks.
On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning
Wang, Jingyao, Qiang, Wenwen, Li, Jiangmeng, Si, Lingyu, Zheng, Changwen, Su, Bing
An effective paradigm of multi-modal learning (MML) is to learn unified representations among modalities. From a causal perspective, constraining the consistency between different modalities can mine causal representations that convey primary events. However, such simple consistency may face the risk of learning insufficient or unnecessary information: a necessary but insufficient cause is invariant across modalities but may not have the required accuracy; a sufficient but unnecessary cause tends to adapt well to specific modalities but may be hard to adapt to new data. To address this issue, in this paper, we aim to learn representations that are both causal sufficient and necessary, i.e., Causal Complete Cause ($C^3$), for MML. Firstly, we define the concept of $C^3$ for MML, which reflects the probability of being causal sufficiency and necessity. We also propose the identifiability and measurement of $C^3$, i.e., $C^3$ risk, to ensure calculating the learned representations' $C^3$ scores in practice. Then, we theoretically prove the effectiveness of $C^3$ risk by establishing the performance guarantee of MML with a tight generalization bound. Based on these theoretical results, we propose a plug-and-play method, namely Causal Complete Cause Regularization ($C^3$R), to learn causal complete representations by constraining the $C^3$ risk bound. Extensive experiments conducted on various benchmark datasets empirically demonstrate the effectiveness of $C^3$R.
ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data.
Learning Nonlinear Overcomplete Representations for Efficient Coding
We derive a learning algorithm for inferring an overcomplete basis by viewing it as probabilistic model of the observed data. Over(cid:173) complete bases allow for better approximation of the underlying statistical density. Using a Laplacian prior on the basis coefficients removes redundancy and leads to representations that are sparse and are a nonlinear function of the data. This can be viewed as a generalization of the technique of independent component anal(cid:173) ysis and provides a method for blind source separation of fewer mixtures than sources. We demonstrate the utility of overcom(cid:173) plete representations on natural speech and show that compared to the traditional Fourier basis the inferred representations poten(cid:173) tially have much greater coding efficiency.
Wasserstein Dependency Measure for Representation Learning
Ozair, Sherjil, Lynch, Corey, Bengio, Yoshua, Oord, Aaron van den, Levine, Sergey, Sermanet, Pierre
Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning. However, such approaches are fundamentally limited since a tight lower bound on mutual information requires sample size exponential in the mutual information. This limits the applicability of these approaches for prediction tasks with high mutual information, such as in video understanding or reinforcement learning. In these settings, such techniques are prone to overfit, both in theory and in practice, and capture only a few of the relevant factors of variation. This leads to incomplete representations that are not optimal for downstream tasks.
Sparse Linear Models applied to Power Quality Disturbance Classification
López-Lopera, Andrés F., Álvarez, Mauricio A., Orozco, Ávaro A.
Power quality (PQ) analysis describes the non-pure electric signals that are usually present in electric power systems. The automatic recognition of PQ disturbances can be seen as a pattern recognition problem, in which different types of waveform distortion are differentiated based on their features. Similar to other quasi-stationary signals, PQ disturbances can be decomposed into time-frequency dependent components by using time-frequency or time-scale transforms, also known as dictionaries. These dictionaries are used in the feature extraction step in pattern recognition systems. Short-time Fourier, Wavelets and Stockwell transforms are some of the most common dictionaries used in the PQ community, aiming to achieve a better signal representation. To the best of our knowledge, previous works about PQ disturbance classification have been restricted to the use of one among several available dictionaries. Taking advantage of the theory behind sparse linear models (SLM), we introduce a sparse method for PQ representation, starting from overcomplete dictionaries. In particular, we apply Group Lasso. We employ different types of time-frequency (or time-scale) dictionaries to characterize the PQ disturbances, and evaluate their performance under different pattern recognition algorithms. We show that the SLM reduce the PQ classification complexity promoting sparse basis selection, and improving the classification accuracy.
Training sparse natural image models with a fast Gibbs sampler of an extended state space
Theis, Lucas, Sohl-dickstein, Jascha, Bethge, Matthias
We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models.
ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Le, Quoc V., Karpenko, Alexandre, Ngiam, Jiquan, Ng, Andrew Y.
Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets.