Goto

Collaborating Authors

 translation-invariant kernel


The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Neural Information Processing Systems

Such embeddings induce the so-called maximum mean discrepancy (MMD; [Smola et al., 2007, Gretton et al., 2012]), which quantifies the discrepancy Many estimators for HSIC exist. The classical ones rely on U-statistics or V -statistics [Gretton et al., 2005, Quadrianto et al., 2009, Pfister et al., 2018] and are known to converge at a rate of Lower bounds for the related MMD are known [Tolstikhin et al., 2016], but the existing analysis considers radial kernels and relies on independent Gaussian distributions.


The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Neural Information Processing Systems

Such embeddings induce the so-called maximum mean discrepancy (MMD; [Smola et al., 2007, Gretton et al., 2012]), which quantifies the discrepancy Many estimators for HSIC exist. The classical ones rely on U-statistics or V -statistics [Gretton et al., 2005, Quadrianto et al., 2009, Pfister et al., 2018] and are known to converge at a rate of Lower bounds for the related MMD are known [Tolstikhin et al., 2016], but the existing analysis considers radial kernels and relies on independent Gaussian distributions.


The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Neural Information Processing Systems

Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of M\ge2 random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on \mathbb{R} d for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is \mathcal{O}\left(n {-1/2}\right) .


Approximation of RKHS Functionals by Neural Networks

arXiv.org Machine Learning

This paper studies the approximation of smooth functionals defined over a reproducing kernel Hilbert space (RKHS) using tanh neural networks. A functional maps from a space of functions that has infinite dimensions to R. In recent years, neural networks have been widely employed in operator learning tasks. We are interested in investigating their capability to approximate nonlinear functionals, a special type of operator. Neural networks have been known as universal approximators since [Cybenko, 1989], i.e., to approximate any continuous function, mapping a finite-dimensional input space into another finite-dimensional output space, to arbitrary accuracy. These days, many interesting tasks entail learning operators, i.e., mappings between an infinite-dimensional input Banach space and (possibly) an infinite-dimensional output space. A prototypical example in scientific computing is to map the initial datum into the (time series of) solution of a nonlinear time-dependent partial differential equation (PDE). A priori, it is unclear if neural networks can be successfully employed to learn such operators from data, given that their universality only pertains to finite-dimensional functions. One of the first successful uses of neural networks in the context of operator learning was provided by [Chen and Chen, 1995].


The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

arXiv.org Machine Learning

Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on $\mathbb R^d$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $\mathcal O\!\left(n^{-1/2}\right)$. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nystr\"om-based one) on $\mathbb R^d$.


Orthonormal Expansions for Translation-Invariant Kernels

arXiv.org Artificial Intelligence

We present a general Fourier analytic technique for constructing orthonormal basis expansions of translation-invariant kernels from orthonormal bases of $\mathscr{L}_2(\mathbb{R})$. This allows us to derive explicit expansions on the real line for (i) Mat\'ern kernels of all half-integer orders in terms of associated Laguerre functions, (ii) the Cauchy kernel in terms of rational functions, and (iii) the Gaussian kernel in terms of Hermite functions.


On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions

arXiv.org Artificial Intelligence

``Benign overfitting'', the ability of certain algorithms to interpolate noisy training data and yet perform well out-of-sample, has been a topic of considerable recent interest. We show, using a fixed design setup, that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions. In particular, the estimated predictor does not converge to the ground truth with increasing sample size, for any non-zero regression function and any (even adaptive) bandwidth selection. To prove these results, we give exact expressions for the generalization error, and its decomposition in terms of an approximation error and an estimation error that elicits a trade-off based on the selection of the kernel bandwidth. Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy.


Strictly proper kernel scores and characteristic kernels on compact spaces

arXiv.org Machine Learning

Strictly proper kernel scores are well-known tool in probabilistic forecasting, while characteristic kernels have been extensively investigated in the machine learning literature. We first show that both notions coincide, so that insights from one part of the literature can be used in the other. We then show that the metric induced by a characteristic kernel cannot reliably distinguish between distributions that are far apart in the total variation norm as soon as the underlying space of measures is infinite dimensional. In addition, we provide a characterization of characteristic kernels in terms of eigenvalues and -functions and apply this characterization to the case of continuous kernels on (locally) compact spaces. In the compact case we further show that characteristic kernels exist if and only if the space is metrizable. As special cases of our general theory we investigate translation-invariant kernels on compact Abelian groups and isotropic kernels on spheres. The latter are of particular interest for forecast evaluation of probabilistic predictions on spherical domains as frequently encountered in meteorology and climatology.


Universalities of Reproducing Kernels Revisited

arXiv.org Machine Learning

Kernel methods have been widely applied to machine learning and other questions of approximating an unknown function from its finite sample data. To ensure arbitrary accuracy of such approximation, various denseness conditions are imposed on the selected kernel. This note contributes to the study of universal, characteristic, and $C_0$-universal kernels. We first give simple and direct description of the difference and relation among these three kinds of universalities of kernels. We then focus on translation-invariant and weighted polynomial kernels. A simple and shorter proof of the known characterization of characteristic translation-invariant kernels will be presented. The main purpose of the note is to give a delicate discussion on the universalities of weighted polynomial kernels.