covariance matrix
Variational Inference with Mixtures of Isotropic Gaussians
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments.
Seeds of Structure: Patch PCAReveals Universal Compositional Cues in Diffusion Models
Diffusion models transform random noise into images of remarkable fidelity, yet the structure of this noise-to-image map remains largely unexplored. We investigate this relationship using patch-wise Principal Component Analysis (PCA) and empirically demonstrate that low-frequency components of the initial noise predominantly influence the compositional structure of generated images. Our analyses reveal that noise seeds inherently contain universal compositional cues, evident when identical seeds produce images with similar structural attributes across different datasets and model architectures. Leveraging these insights, we develop and theoretically justify a simple yet effective Patch PCA denoiser that extracts underlying structure from noise using only generic natural image statistics. The robustness of these structural cues is observed to persist across both pixel-space models and latent diffusion models, highlighting their fundamental nature. Finally, we introduce a zero-shot editing method that enables injecting compositional control over generated images, providing an intuitive approach to guided generation without requiring model fine-tuning or additional training.
Locality in Image Diffusion Models Emerges from Data Statistics
Recent work has shown that the generalization ability of image diffusion models arises from the locality properties of the trained neural network. In particular, when denoising a particular pixel, the model relies on a limited neighborhood of the input image around that pixel, which, according to the previous work, is tightly related to the ability of these models to produce novel images. Since locality is central to generalization, it is crucial to understand why diffusion models learn local behavior in the first place, as well as the factors that govern the properties of locality patterns. In this work, we present evidence that the locality in deep diffusion models emerges as a statistical property of the image dataset and is not due to the inductive bias of convolutional neural networks, as suggested in previous work. Specifically, we demonstrate that an optimal parametric linear denoiser exhibits similar locality properties to deep neural denoisers. We show, both theoretically and experimentally, that this locality arises directly from pixel correlations present in the image datasets. Moreover, locality patterns are drastically different on specialized datasets, approximating principal components of the data's covariance. We use these insights to craft an analytical denoiser that better matches scores predicted by a deep diffusion model than prior expert-crafted alternatives. Our key takeaway is that while neural network architectures influence generation quality, their primary role is to capture locality patterns inherent in the data.
Topology-Aware Conformal Prediction for Stream Networks
Existing approaches either neglect dependencies, leading to overly conservative predictions, or rely solely on data-driven estimations, failing to capture the rich topological structure of the network. To address these challenges, we propose Spatio-Temporal Adaptive Conformal Inference (STACI), a novel framework that integrates network topology and temporal dynamics into the conformal prediction framework. STACIintroduces a topology-aware nonconformity score that respects directional flow constraints and dynamically adjusts prediction sets to account for temporal distributional shifts. We provide theoretical guarantees on the validity of our approach and demonstrate its superior performance on both synthetic and real-world datasets. Our results show that STACIeffectively balances prediction efficiency and coverage, outperforming existing conformal prediction methods for stream networks.
Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment
Test-time adaptation (TTA) enhances the zero-shot robustness under distribution shifts by leveraging unlabeled test data during inference. Despite notable advances, several challenges still limit its broader applicability. First, most methods rely on backpropagation or iterative optimization, which limits scalability and hinders real-time deployment. Second, they lack explicit modeling of class-conditional feature distributions. This modeling is crucial for producing reliable decision boundaries and calibrated predictions, but it remains underexplored due to the lack of both source data and supervision at test time.
Spectral Learning for Infinite-Horizon Average-Reward POMDPs
We address the learning problem in the context of infinite-horizon average-reward POMDPs. Traditionally, this problem has been approached using Spectral Decomposition (SD) methods applied to samples collected under non-adaptive policies, such as uniform or round-robin policies. Recently, SD techniques have been extended to accommodate a restricted class of adaptive policies such as memoryless policies. However, the use of adaptive policies has introduced challenges related to data inefficiency, as SD methods typically require all samples to be drawn from a single policy. In this work, we propose Mixed Spectral Estimation, which generalizes spectral estimation techniques to support a broader class of belief-based policies.
Non-Asymptotic Analysis Of Data Augmentation For Precision Matrix Estimation
This paper addresses the problem of inverse covariance (also known as precision matrix) estimation in high-dimensional settings. Specifically, we focus on two classes of estimators: linear shrinkage estimators with a target proportional to the identity matrix, and estimators derived from data augmentation (DA). Here, DA refers to the common practice of enriching a dataset with artificial samples--typically generated via a generative model or through random transformations of the original data--prior to model fitting. For both classes of estimators, we derive estimators and provide concentration bounds for their quadratic error. This allows for both method comparison and hyperparameter tuning, such as selecting the optimal proportion of artificial samples. On the technical side, our analysis relies on tools from random matrix theory. We introduce a novel deterministic equivalent for generalized resolvent matrices, accommodating dependent samples with specific structure. We support our theoretical results with numerical experiments.
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning
Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds.
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy Phuc Tran VinUniversity Nisheeth K. Vishnoi Yale University Van H. Vu Yale University
A central challenge in machine learning is to understand how noise or measurement errors affect low-rank approximations--particularly in the spectral norm. This question is especially important in differentially private low-rank approximation, where one aims to preserve the top-pstructure of a data-derived matrix while ensuring privacy.