Pathologies in information bottleneck for deterministic supervised learning
Kolchinsky, Artemy, Tracey, Brendan D., Van Kuyk, Steven
Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y . To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X; T) and high mutual information I(Y; T). The IB curve characterizes the set of bottleneck variables that achieve maximal I(Y; T) for a given I(X; T), and is typically explored by optimizing the IB Lagrangian, I(Y; T) βI(X; T). Recently, there has been interest in applying IB to supervised learning, particularly for classification problems that use neural networks. In most classification problems, the output class Y is a deterministic function of the input X, which we refer to as "deterministic supervised learning". We demonstrate three pathologies that arise when IB is used in any scenario where Y is a deterministic function of X: (1) the IB curve cannot be recovered by optimizing the IB Lagrangian for different values of β; (2) there are "uninteresting" solutions at all points of the IB curve; and (3) for classifiers that achieve low error rates, the activity of different hidden layers will not exhibit a strict tradeoff between compression and prediction, contrary to a recent proposal. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We finish by demonstrating these issues on the MNIST dataset. The information bottleneck (IB) method (Tishby et al., 1999) provides a principled way to extract information that is present in one variable that is relevant for predicting another variable. Given two random variables X and Y, IB posits a "bottleneck" variable T that obeys the Markov condition Y X T . By the data processing inequality (DPI) (Cover & Thomas, 2012), this Markov condition implies that I(X; T) I(Y; T), meaning the bottleneck variable cannot contain more information about Y than it does about X.
Aug-22-2018
- Country:
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Massachusetts > Middlesex County
- Oceania > New Zealand
- North Island > Wellington Region > Wellington (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine > Diagnostic Medicine (0.61)
- Technology: