Goto

Collaborating Authors

 gaussianity


FMPlug: Plug-In Foundation Flow-Matching Priors for Inverse Problems

Wan, Yuxiang, Devera, Ryan, Zhang, Wenjie, Sun, Ju

arXiv.org Artificial Intelligence

We present FMPlug, a novel plug-in framework that enhances foundation flow-matching (FM) priors for solving ill-posed inverse problems. Unlike traditional approaches that rely on domain-specific or untrained priors, FMPlug smartly leverages two simple but powerful insights: the similarity between observed and desired objects and the Gaussianity of generative flows. By introducing a time-adaptive warm-up strategy and sharp Gaussianity regularization, FMPlug unlocks the true potential of domain-agnostic foundation models. Our method beats state-of-the-art methods that use foundation FM priors by significant margins, on image super-resolution and Gaussian deblurring.


Saving Foundation Flow-Matching Priors for Inverse Problems

Wan, Yuxiang, Devera, Ryan, Zhang, Wenjie, Sun, Ju

arXiv.org Artificial Intelligence

Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock their potential? We introduce FMPlug, a plug-in framework that redefines how foundation FMs are used in IPs. FMPlug combines an instance-guided, time-dependent warm-start strategy with a sharp Gaussianity regularization, adding problem-specific guidance while preserving the Gaussian structures. This leads to a significant performance boost across image restoration and scientific IPs. Our results point to a path for making foundation FM models practical, reusable priors for IP solving.


d33174c464c877fb03e77efdab4ae804-AuthorFeedback.pdf

Neural Information Processing Systems

Our work "establishes interpretations of SGD and Adam-family optimizers from a Bayesian filtering perspective" (R3). It is "the first to demonstrate how viewing optimization as Bayesian inference requires modeling temporal dynamics" Adam W" (R4), and therefore explains the excellent performance of these SOT A methods. In the ideal case you shouldn't use a factorised model, and 77-81 aren't trying to motivate a factorised model. Also, see "Conclusions" above for non-factorised future Khan et al. 2018), but we agree that its improvement is an important avenue for future research. Minor 1. Agreed, but a few people get very confused on this point.


Approximate Gaussianity Beyond Initialisation in Neural Networks

Hirst, Edward, Ramgoolam, Sanjaye

arXiv.org Artificial Intelligence

Ensembles of neural network weight matrices are studied through the training process for the MNIST classification problem, testing the efficacy of matrix models for representing their distributions, under assumptions of Gaussianity and permutation-symmetry. The general 13-parameter permutation invariant Gaussian matrix models are found to be effective models for the correlated Gaussianity in the weight matrices, beyond the range of applicability of the simple Gaussian with independent identically distributed matrix variables, and notably well beyond the initialisation step. The representation theoretic model parameters, and the graph-theoretic characterisation of the permutation invariant matrix observables give an interpretable framework for the best-fit model and for small departures from Gaussianity. Additionally, the Wasserstein distance is calculated for this class of models and used to quantify the movement of the distributions over training. Throughout the work, the effects of varied initialisation regimes, regularisation, layer depth, and layer width are tested for this formalism, identifying limits where particular departures from Gaussianity are enhanced and how more general, yet still highly-interpretable, models can be developed.


Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

Hwang, Jisung, Kim, Jaihoon, Sung, Minhyuk

arXiv.org Artificial Intelligence

We propose a novel regularization loss that enforces standard Gaussianity, encouraging samples to align with a standard Gaussian distribution. This facilitates a range of downstream tasks involving optimization in the latent space of text-to-image models. We treat elements of a high-dimensional sample as one-dimensional standard Gaussian variables and define a composite loss that combines moment-based regularization in the spatial domain with power spectrum-based regularization in the spectral domain. Since the expected values of moments and power spectrum distributions are analytically known, the loss promotes conformity to these properties. To ensure permutation invariance, the losses are applied to randomly permuted inputs. Notably, existing Gaussianity-based regularizations fall within our unified framework: some correspond to moment losses of specific orders, while the previous covariance-matching loss is equivalent to our spectral loss but incurs higher time complexity due to its spatial-domain computation. We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence.


d33174c464c877fb03e77efdab4ae804-AuthorFeedback.pdf

Neural Information Processing Systems

Our work "establishes interpretations of SGD and Adam-family optimizers from a Bayesian filtering perspective" (R3). It is "the first to demonstrate how viewing optimization as Bayesian inference requires modeling temporal dynamics" Adam W" (R4), and therefore explains the excellent performance of these SOT A methods. In the ideal case you shouldn't use a factorised model, and 77-81 aren't trying to motivate a factorised model. Also, see "Conclusions" above for non-factorised future Khan et al. 2018), but we agree that its improvement is an important avenue for future research. Minor 1. Agreed, but a few people get very confused on this point.


Shortening the Trajectories: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation

Qu, Jingxiang, Gao, Wenhan, Liu, Yi

arXiv.org Machine Learning

Gaussian-based Probabilistic Generative Models (GPGMs) generate data by reversing a stochastic process that progressively corrupts samples with Gaussian noise. While these models have achieved state-of-the-art performance across diverse domains, their practical deployment remains constrained by the high computational cost of long generative trajectories, which often involve hundreds to thousands of steps during training and sampling. In this work, we introduce a theoretically grounded and empirically validated framework that improves generation efficiency without sacrificing training granularity or inference fidelity. Our key insight is that for certain data modalities, the noising process causes data to rapidly lose its identity and converge toward a Gaussian distribution. We analytically identify a characteristic step at which the data has acquired sufficient Gaussianity, and then replace the remaining generation trajectory with a closed-form Gaussian approximation. Unlike existing acceleration techniques that coarsening the trajectories by skipping steps, our method preserves the full resolution of learning dynamics while avoiding redundant stochastic perturbations between `Gaussian-like' distributions. Empirical results across multiple data modalities demonstrate substantial improvements in both sample quality and computational efficiency.


Review for NeurIPS paper: Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

Neural Information Processing Systems

Summary and Contributions: Post-rebuttal: Dear authors, thank you for your detailed response and offering to fix many points we raised. I would like to sum up my thoughts after having read the other reviews and your rebuttal: On a high level, the following aspects were most significant how I approached towards my final score: 1) The perspective is novel, and has interesting potential. Re 1: I think we all agree that this is a pro for the paper and should be considered its main strength. Re 2: Questioning the approximations is a valid point. However, as you argue, you provided sufficient empirical evidence for the mini-batch Gaussianity, and I think that Gaussianity is often assumed without further justification in other Bayesian inference applications as well, simply to keep the computations tractable. Even if the assumptions are not fully realistic, they seem to be "less concerning than those in past work" (rebuttal, line 19).


Review for NeurIPS paper: Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

Neural Information Processing Systems

After a discussion with the reviewers, I converged towards recommending to accept this submission. The reviewers raised the following aspects: 1) The perspective is novel, and has interesting potential. Re 1: all reviewers agree that this is a pro for the paper and should be considered its main strength. The authors agree (rebuttal, lines 23-25). Re 2: R3 believes that questioning the approximations is a valid point. However, as the authors argue, they have provided sufficient empirical evidence for mini-batch Gaussianity in appendix B, and Gaussianity is sometimes assumed without further justification in other Bayesian inference applications as well, simply to keep the computations tractable.


Explainable Learning with Gaussian Processes

Butler, Kurt, Feng, Guanchao, Djuric, Petar M.

arXiv.org Artificial Intelligence

The field of explainable artificial intelligence (XAI) attempts to develop methods that provide insight into how complicated machine learning methods make predictions. Many methods of explanation have focused on the concept of feature attribution, a decomposition of the model's prediction into individual contributions corresponding to each input feature. In this work, we explore the problem of feature attribution in the context of Gaussian process regression (GPR). We take a principled approach to defining attributions under model uncertainty, extending the existing literature. We show that although GPR is a highly flexible and non-parametric approach, we can derive interpretable, closed-form expressions for the feature attributions. When using integrated gradients as an attribution method, we show that the attributions of a GPR model also follow a Gaussian process distribution, which quantifies the uncertainty in attribution arising from uncertainty in the model. We demonstrate, both through theory and experimentation, the versatility and robustness of this approach. We also show that, when applicable, the exact expressions for GPR attributions are both more accurate and less computationally expensive than the approximations currently used in practice.