Goto

Collaborating Authors

 universality






Re: Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators (ID=1064)

Neural Information Processing Systems

Re: Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators (ID=1064). We thank the reviewers for reviewing our work. We will update the paper based on the suggestions. On what occasion would the diffeomorphic universality results be useful other than distribution approximation? Thank you for pointing out the missing references.


A Sharp Universality Dichotomy for the Free Energy of Spherical Spin Glasses

Kim, Taegyun

arXiv.org Machine Learning

We study the free energy for pure and mixed spherical $p$-spin models with i.i.d.\ disorder. In the mixed case, each $p$-interaction layer is assumed either to have regularly varying tails with exponent $α_p$ or to satisfy a finite $2p$-th moment condition. For the pure spherical $p$-spin model with regularly varying disorder of tail index $α$, we introduce a tail-adapted normalization that interpolates between the classical Gaussian scaling and the extreme-value scale, and we prove a sharp universality dichotomy for the quenched free energy. In the subcritical regime $α<2p$, the thermodynamics is driven by finitely many extremal couplings and the free energy converges to a non-degenerate random limit described by the NIM (non-intersecting monomial) model, depending only on extreme-order statistics. At the critical exponent $α=2p$, we obtain a random one-dimensional TAP-type variational formula capturing the coexistence of an extremal spike and a universal Gaussian bulk on spherical slices. In the supercritical regime $α>2p$ (more generally, under a finite $2p$-th moment assumption), the free energy is universal and agrees with the deterministic Crisanti--Sommers/Parisi value of the corresponding Gaussian model, as established in [Sawhney-Sellke'24]. We then extend the subcritical and critical results to mixed spherical models in which each $p$-layer is either heavy-tailed with $α_p\le 2p$ or has finite $2p$-th moment. In particular, we derive a TAP-type variational representation for the mixed model, yielding a unified universality classification of the quenched free energy across tail exponents and mixtures.


Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Neural Information Processing Systems

In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this universality, where $d$ in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only $\widetilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.


Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups

Neural Information Processing Systems

We show the universality of depth-2 group convolutional neural networks (GCNNs) in a unified and constructive manner based on the ridgelet theory. Despite widespread use in applications, the approximation property of (G)CNNs has not been well investigated. The universality of (G)CNNs has been shown since the late 2010s. Yet, our understanding on how (G)CNNs represent functions is incomplete because the past universality theorems have been shown in a case-by-case manner by manually/carefully assigning the network parameters depending on the variety of convolution layers, and in an indirect manner by converting/modifying the (G)CNNs into other universal approximators such as invariant polynomials and fully-connected networks. In this study, we formulate a versatile depth-2 continuous GCNN $S[\gamma]$ as a nonlinear mapping between group representations, and directly obtain an analysis operator, called the ridgelet trasform, that maps a given function $f$ to the network parameter $\gamma$ so that $S[\gamma]=f$.


Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

Neural Information Processing Systems

Invertible neural networks based on coupling flows (CF-INNs) have various machine learning applications such as image synthesis and representation learning. However, their desirable characteristics such as analytic invertibility come at the cost of restricting the functional forms. This poses a question on their representation power: are CF-INNs universal approximators for invertible functions? Without a universality, there could be a well-behaved invertible transformation that the CF-INN can never approximate, hence it would render the model class unreliable. We answer this question by showing a convenient criterion: a CF-INN is universal if its layers contain affine coupling and invertible linear functions as special cases. As its corollary, we can affirmatively resolve a previously unsolved problem: whether normalizing flow models based on affine coupling can be universal distributional approximators. In the course of proving the universality, we prove a general theorem to show the equivalence of the universality for certain diffeomorphism classes, a theoretical insight that is of interest by itself.


From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference

Plummer, Sean

arXiv.org Machine Learning

Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q(θ) = \int k(θ| z) r(dz)$, where the conditional kernel is parameterized and the mixing base is fixed and tractable. This paper develops a unified "approximation-optimization-statistics'' theory for such families. On the approximation side, we show that under compact L1-universality and a mild tail-dominance condition, semi-implicit families are dense in L1 and can achieve arbitrarily small forward Kullback-Leibler (KL) error. We also identify two sharp obstructions to global approximation: (i) an Orlicz tail-mismatch condition that induces a strictly positive forward-KL gap, and (ii) structural restrictions, such as non-autoregressive Gaussian kernels, that force "branch collapse'' in conditional distributions. For each obstruction we give a minimal structural modification that restores approximability. On the optimization side, we establish finite-sample oracle inequalities and prove that the empirical SIVI objectives L(K,n) $Γ$-converge to their population limit as n and K tend to infinity. These results give consistency of empirical maximizers, quantitative control of finite-K surrogate bias, and stability of the resulting variational posteriors. Combining the approximation and optimization analyses yields the first general end-to-end statistical theory for SIVI: we characterize precisely when SIVI can recover the target distribution, when it cannot, and how architectural and algorithmic choices govern the attainable asymptotic behavior.