Goto

Collaborating Authors

 information bottleneck


Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames

Neural Information Processing Systems

Behavioral cloning is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This "blindfolded" expert is compelled to employ nontrivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside videogames from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with p I/m, where I measures the amount of task information available to the demonstrator, and mis the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks.


MIHC: Multi-View Interpretable Hypergraph Neural Networks with Information Bottleneck for Chip Congestion Prediction

Neural Information Processing Systems

With the advancement of artificial intelligence (AI) and increasing integrated circuit (IC) design complexity, efficient chip design through electronic design automation (EDA) has become critical. Fast and accurate congestion prediction in chip layout and routing can significantly enhance automated design performance. Existing congestion modeling methods are limited by (i) ineffective processing and fusion of multi-view circuit data information, and (ii) insufficient reliability and interpretability in the prediction process. To address these challenges, we propose the Multi-view Interpretable Hypergraph for Chip (MIHC), a trustworthy multi-view hypergraph neural network framework that (i) processes both graph and image information in unified hypergraph representations, capturing topological and geometric circuit data; (ii) implements a novel subgraph Information Bottleneck mechanism, identifying critical congestion-correlated regions to guide predictions. This work is the first attempt to incorporate such interpretability into congestion prediction through informative graph reasoning. Experiments show that the MIHC method reduces NMAE by 16.67% and 8.57% in cell-based and grid-based predictions on ISPD2015, and 5.26% and 2.44% on CircuitNet-N28, respectively, compared to state-of-the-art methods. Rigorous cross-design generalization experiments further validate our method's capability to handle entirely unseen circuit designs.


Prompt-Guided Alignment with Information Bottleneck Makes Image Compression Also a Restorer

Neural Information Processing Systems

Learned Image Compression (LIC) models face critical challenges in real-world scenarios due to various environmental degradations, such as fog and rain. Due to the distribution mismatch between degraded inputs and clean training data, welltrained LIC models suffer from reduced compression efficiency, while retraining dedicated models for diverse degradation types is costly and impractical. Our method addresses the above issue by leveraging prompt learning under the information bottleneck principle, enabling compact extraction of shared components between degraded and clean images for improved latent alignment and compression efficiency. In detail, we propose an Information Bottleneck-constrained Latent Representation Unifying (IB-LRU) scheme, in which a Probabilistic Prompt Generator (PPG) is deployed to simultaneously capture the distribution of different degradations.


Interpretable Prototype-based Graph Information Bottleneck

Neural Information Processing Systems

The success of Graph Neural Networks (GNNs) has led to a need for understanding their decision-making process and providing explanations for their predictions, which has given rise to explainable AI (XAI) that offers transparent explanations for black-box models. Recently, the use of prototypes has successfully improved the explainability of models by learning prototypes to imply training graphs that affect the prediction. However, these approaches tend to provide prototypes with excessive information from the entire graph, leading to the exclusion of key substructures or the inclusion of irrelevant substructures, which can limit both the interpretability and the performance of the model in downstream tasks. In this work, we propose a novel framework of explainable GNNs, called interpretable Prototype-based Graph Information Bottleneck (PGIB), that incorporates prototype learning within the information bottleneck framework to provide prototypes with the key subgraph from the input graph that is important for the model prediction. This is the first work that incorporates prototype learning into the process of identifying the key subgraphs that have a critical impact on the prediction performance. Extensive experiments, including qualitative analysis, demonstrate that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.


A Sufficient-Statistic Reduction of the Information Bottleneck to a Low-Dimensional Problem

arXiv.org Machine Learning

We show that if the conditional distribution p(C | T) factors through a sufficient statistic ฯ•(T), then the Information Bottleneck (IB) problem for (T, C) is exactly equivalent to the IB problem for (ฯ•(T), C). The reduction is loss-free: it preserves the full IB curve, the Lagrangian optimum at every trade-off parameter \b{eta}, and the optimal representations up to pullback through ฯ•. As a result, the computational complexity of solving the IB problem is governed by the dimension of the sufficient statistic rather than the ambient dimension of the source. This identifies an exact structural condition under which the generic IB problem becomes tractable, and gives a formal bridge between the discrete and linear-Gaussian regimes. We then show that the classical Gaussian IB solution of Chechik, Globerson, Tishby and Weiss is an immediate corollary of this reduction, and we state a nonlinear-Gaussian generalisation. A small numerical example illustrates the practical consequence: when a low-dimensional sufficient statistic is available, the exact IB curve can be computed on the reduced problem at a cost determined by the statistic rather than by the ambient source dimension.



Drift doesn't Matter: Dynamic Decomposition with Diffusion Reconstruction for Unstable Multivariate Time Series Anomaly Detection

Neural Information Processing Systems

Many unsupervised methods have recently been proposed for multivariate time series anomaly detection. However, existing works mainly focus on stable data yet often omit the drift generated from non-stationary environments, which may lead to numerous false alarms. We propose Dynamic Decomposition with Diffusion Reconstruction (D3R), a novel anomaly detection network for real-world unstable data to fill the gap. D3R tackles the drift via decomposition and reconstruction. In the decomposition procedure, we utilize data-time mix-attention to dynamically decompose long-period multivariate time series, overcoming the limitation of the local sliding window.




Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

Neural Information Processing Systems

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.