disentangling
Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.
Disentangling by Subspace Diffusion
We present a novel nonparametric algorithm for symmetry-based disentangling of data manifolds, the Geometric Manifold Component Estimator (GEOMANCER). GEOMANCER provides a partial answer to the question posed by Higgins et al.(2018): is it possible to learn how to factorize a Lie group solely from observations of the orbit of an object it acts on? We show that fully unsupervised factorization of a data manifold is possible the true metric of the manifold is known and each factor manifold has nontrivial holonomy - for example, rotation in 3D. Our algorithm works by estimating the subspaces that are invariant under random walk diffusion, giving an approximation to the de Rham decomposition from differential geometry. We demonstrate the efficacy of GEOMANCER on several complex synthetic manifolds. Our work reduces the question of whether unsupervised disentangling is possible to the question of whether unsupervised metric learning is possible, providing a unifying insight into the geometric nature of representation learning.
Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect
The "cold posterior effect" (CPE) in Bayesian deep learning describes the disturbing observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T <1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.
Disentangling the Roles of Distinct Cell Classes with Cell-Type Dynamical Systems
Latent dynamical systems have been widely used to characterize the dynamics of neural population activity in the brain. However, these models typically ignore the fact that the brain contains multiple cell types. This limits their ability to capture the functional roles of distinct cell classes, and to predict the effects of cell-specific perturbations on neural activity or behavior. To overcome these limitations, we introduce the "cell-type dynamical systems" (CTDS) model. This model extends latent linear dynamical systems to contain distinct latent variables for each cell class, with biologically inspired constraints on both dynamics and emissions.
Review for NeurIPS paper: Disentangling by Subspace Diffusion
Strengths: This paper provides new insights into the problem of disentangling independent latent factors, viewed here through the lens of factorizing groups of transformations on a data manifold. The authors base their construction on the de Rham decomposition, which itself is based on the holonomy group that considers parallel transport over loops on a manifold. Essentially, the authors seek to extract multiple representations of input data, such as each of them encodes a submanifold with holonomy group independent from all other submanifolds. This provides an important formalism to an important problem that is often ill defined, with mostly heuristic qualitative goals that depend on specific applications rather than studied with rigor. The construction itself here is based on extending the work of Singer and Wu on vector diffusion maps, which enriches more traditional manifold learning by encoding information about tangent spaces and the operation of the connection Laplacian on tangent vector fields.
Review for NeurIPS paper: Disentangling by Subspace Diffusion
The paper reduces the question of whether unsupervised disentangling is possible to the question of whether unsupervised metric learning is possible, providing a unifying insight into the geometric nature of representation learning. All reviewers think the theory and algorithm developed for decomposing Lie group is novel. The paper is missing citations of previous work related to fibre bundle, and manifold learning, which the authors should remedy in the revised version.
Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training.
Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
Shimizu, Ryotaro, Wada, Takashi, Wang, Yu, Kruse, Johannes, O'Brien, Sean, HtaungKham, Sai, Song, Linxin, Yoshikawa, Yuya, Saito, Yuki, Tsung, Fugee, Goto, Masayuki, McAuley, Julian
Recent research on explainable recommendation generally frames the task as a standard text generation problem, and evaluates models simply based on the textual similarity between the predicted and ground-truth explanations. However, this approach fails to consider one crucial aspect of the systems: whether their outputs accurately reflect the users' (post-purchase) sentiments, i.e., whether and why they would like and/or dislike the recommended items. To shed light on this issue, we introduce new datasets and evaluation methods that focus on the users' sentiments. Specifically, we construct the datasets by explicitly extracting users' positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users' sentiments, and 2) accurately identify both positive and negative opinions of users on the target items. We benchmark several recent models on our datasets and demonstrate that achieving strong performance on existing metrics does not ensure that the generated explanations align well with the users' sentiments. Lastly, we find that existing models can provide more sentiment-aware explanations when the users' (predicted) ratings for the target items are directly fed into the models as input. We will release our code and datasets upon acceptance.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Hong Kong (0.04)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect
The "cold posterior effect" (CPE) in Bayesian deep learning describes the disturbing observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T 1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength. Our results demonstrate how the CPE can arise in isolation from synthetic curation, data augmentation, and bad priors.
Disentangling, Amplifying, and Debiasing: Learning Disentangled Representations for Fair Graph Neural Networks
Lee, Yeon-Chang, Shin, Hojung, Kim, Sang-Wook
Graph Neural Networks (GNNs) have become essential tools for graph representation learning in various domains, such as social media and healthcare. However, they often suffer from fairness issues due to inherent biases in node attributes and graph structure, leading to unfair predictions. To address these challenges, we propose a novel GNN framework, DAB-GNN, that Disentangles, Amplifies, and deBiases attribute, structure, and potential biases in the GNN mechanism. DAB-GNN employs a disentanglement and amplification module that isolates and amplifies each type of bias through specialized disentanglers, followed by a debiasing module that minimizes the distance between subgroup distributions to ensure fairness. Extensive experiments on five datasets demonstrate that DAB-GNN significantly outperforms ten state-of-the-art competitors in terms of achieving an optimal balance between accuracy and fairness.