Feder, Meir
An Information-Theoretic Framework for Non-linear Canonical Correlation Analysis
Painsky, Amichai, Feder, Meir, Tishby, Naftali
Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Non-linear CCA extends this notion to a broader family of transformations, which are more powerful for many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) provides an optimal solution to the non-linear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of observations is available. In this work we introduce an information-theoretic framework for the non-linear CCA problem (ITCCA), which extends the classical ACE approach. Our suggested framework seeks compressed representations of the data that allow a maximal level of correlation. This way we control the trade-off between the flexibility and the complexity of the representation. Our approach demonstrates favorable performance at a reduced computational burden, compared to non-linear alternatives, in a finite sample size regime. Further, ITCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it implies a "soft" dimensionality reduction, as the compression level is measured (and governed) by the mutual information between the original noisy data and the signals that we extract.
Linear Independent Component Analysis over Finite Fields: Algorithms and Bounds
Painsky, Amichai, Rosset, Saharon, Feder, Meir
Independent Component Analysis (ICA) is a statistical tool that decomposes an observed random vector into components that are as statistically independent as possible. ICA over finite fields is a special case of ICA, in which both the observations and the decomposed components take values over a finite alphabet. This problem is also known as minimal redundancy representation or factorial coding. In this work we focus on linear methods for ICA over finite fields. We introduce a basic lower bound which provides a fundamental limit to the ability of any linear solution to solve this problem. Based on this bound, we present a greedy algorithm that outperforms all currently known methods. Importantly, we show that the overhead of our suggested algorithm (compared with the lower bound) typically decreases, as the scale of the problem grows. In addition, we provide a sub-optimal variant of our suggested method that significantly reduces the computational complexity at a relatively small cost in performance. Finally, we discuss the universal abilities of linear transformations in decomposing random vectors, compared with existing non-linear solutions.
Outperforming Good-Turing: Preliminary Report
Painsky, Amichai, Feder, Meir
Estimating a large alphabet probability distribution from a limited number of samples is a fundamental problem in machine learning and statistics. A variety of estimation schemes have been proposed over the years, mostly inspired by the early work of Laplace and the seminal contribution of Good and Turing. One of the basic assumptions shared by most commonly-used estimators is the unique correspondence between the symbol's sample frequency and its estimated probability. In this work we tackle this paradigmatic assumption; we claim that symbols with "similar" frequencies shall be assigned the same estimated probability value. This way we regulate the number of parameters and improve generalization. In this preliminary report we show that by applying an ensemble of such regulated estimators, we introduce a dramatic enhancement in the estimation accuracy (typically up to 50%), compared to currently known methods. An implementation of our suggested method is publicly available at the first author's web-page.