Goto

Collaborating Authors

 compactly


A Another universality result for neural oscillators

Neural Information Processing Systems

The universal approximation Theorem 3.1 immediately implies another universal approximation Thus y (t) solves the ODE (2.6), with initial condition y (0) = y (0) = 0 . Reconstruction of a continuous signal from its sine transform. Step 0: (Equicontinuity) We recall the following fact from topology. F (ฯ„):= null f (ฯ„), for ฯ„ 0, f ( ฯ„), for ฯ„ 0. Since F is odd, the Fourier transform of F is given by We provide the details below. The next step in the proof of the fundamental Lemma 3.5 needs the following preliminary result in By (B.3), this implies that It follows from Lemma 3.4 that for any input By the sine transform reconstruction Lemma B.1, there exists It follows from Lemma 3.6, that there exists Indeed, Lemma 3.7 shows that time-delays of any given input signal can be approximated with any Step 1: By the Fundamental Lemma 3.5, there exist It follows from Lemma 3.6, that there exists an oscillator Step 3: Finally, by Lemma 3.8, there exists an oscillator network,


b9523d484af624986c2e0c630ac44ecb-Supplemental-Conference.pdf

Neural Information Processing Systems

Lemma B.4. (Lemma 2.1.8 in [4]) For any diffeomorphismf Diffkc Rd and any ฮด > 0, there exists a finite sequence of(ฮด,k)-near-identity diffeomorphismsg1,,gs such that f = gs gs 1 g1. Let ฯ€i: Rd R denote the projection onto theith coordinate. Supposef: Rd Rd is compactly supported and sufficientlyCk-close to the identity. In this section, we analysis how to make the affine coupling flow with dimension-augmentation invertible. Tohandle this problem, we need to makesure thatRange(F)is tractable for easy sampling.





gp2Scale: A Class of Compactly-Supported Non-Stationary Kernels and Distributed Computing for Exact Gaussian Processes on 10 Million Data Points

arXiv.org Artificial Intelligence

Despite a large corpus of recent work on scaling up Gaussian processes, a stubborn trade-off between computational speed, prediction and uncertainty quantification accuracy, and customizability persists. This is because the vast majority of existing methodologies exploit various levels of approximations that lower accuracy and limit the flexibility of kernel and noise-model designs -- an unacceptable drawback at a time when expressive non-stationary kernels are on the rise in many fields. Here, we propose a methodology we term \emph{gp2Scale} that scales exact Gaussian processes to more than 10 million data points without relying on inducing points, kernel interpolation, or neighborhood-based approximations, and instead leveraging the existing capabilities of a GP: its kernel design. Highly flexible, compactly supported, and non-stationary kernels lead to the identification of naturally occurring sparse structure in the covariance matrix, which is then exploited for the calculations of the linear system solution and the log-determinant for training. We demonstrate our method's functionality on several real-world datasets and compare it with state-of-the-art approximation algorithms. Although we show superior approximation performance in many cases, the method's real power lies in its agnosticism toward arbitrary GP customizations -- core kernel design, noise, and mean functions -- and the type of input space, making it optimally suited for modern Gaussian process applications.


Diffusion annealed Langevin dynamics: a theoretical study

arXiv.org Machine Learning

The aim of this paper is to give a rigorous presentation of the recently introduced diffusion annealed Langevin dynamics [39]. This stochastic process is a score based generative model and provides an alternative to the well known overdamped Langevin process and its reversed in time version commonly used for sampling purpose. In particular, we will fill some gaps in the main arguments used for building the annealed Langevin dynamics discussed in [39, 30, 24]. We will not discuss its practical efficiency nor its numerical counterparts, that is we will not introduce nor discuss the corresponding discrete algorithms, presented in [24] by the second author, and the references therein. However, some quantitative aspects, useful for discretization schemes or important from the statistical point of view, are discussed in details. Also, for distributions like the gaussian, an important idea introduced in the papers on diffusion annealed Langevin dynamics consists in using a functional inequality (namely the Poincarรฉ inequality) to control some covariance. This inequality is crucial in [24] for proving that the score of the intermediate distributions is Lipschitz continuous, which, as we explain in Section 2, ensures the existence and uniqueness of strong solutions for the annealed Langevin diffusion. As a matter of fact, heavy tailed base distributions are also particularly well suited for the model as will see in an example.


Incremental Generation is Necessity and Sufficient for Universality in Flow-Based Modelling

arXiv.org Machine Learning

Incremental flow-based denoising models have reshaped generative modelling, but their empirical advantage still lacks a rigorous approximation-theoretic foundation. We show that incremental generation is necessary and sufficient for universal flow-based generation on the largest natural class of self-maps of $[0,1]^d$ compatible with denoising pipelines, namely the orientation-preserving homeomorphisms of $[0,1]^d$. All our guarantees are uniform on the underlying maps and hence imply approximation both samplewise and in distribution. Using a new topological-dynamical argument, we first prove an impossibility theorem: the class of all single-step autonomous flows, independently of the architecture, width, depth, or Lipschitz activation of the underlying neural network, is meagre and therefore not universal in the space of orientation-preserving homeomorphisms of $[0,1]^d$. By exploiting algebraic properties of autonomous flows, we conversely show that every orientation-preserving Lipschitz homeomorphism on $[0,1]^d$ can be approximated at rate $\mathcal{O}(n^{-1/d})$ by a composition of at most $K_d$ such flows, where $K_d$ depends only on the dimension. Under additional smoothness assumptions, the approximation rate can be made dimension-free, and $K_d$ can be chosen uniformly over the class being approximated. Finally, by linearly lifting the domain into one higher dimension, we obtain structured universal approximation results for continuous functions and for probability measures on $[0,1]^d$, the latter realized as pushforwards of empirical measures with vanishing $1$-Wasserstein error.



Robust Density Estimation under Besov IPM Losses

Neural Information Processing Systems

As shown in several recent papers [Liu et al., 2017, Liang, 2018, Singh et al., 2018, Uppal Namely, the estimators used in past work rely on the unrealistic assumption that the practitioner knows the Besov space in which the true density lies. The rest of this paper is organized as follows.