Goto

Collaborating Authors

 eigenvalue decay


Scalable Gaussian process inference via neural feature maps

arXiv.org Machine Learning

We present a theoretically grounded Gaussian process framework that leverages neural feature maps to construct expressive kernels. We show that the learned feature map can be interpreted as an optimal low-rank approximation to a Gram matrix derived from an implied RKHS, from which we establish consistency of the GP posterior. We further analyse the spectral properties of the induced kernels and introduce product feature-map kernels to address oversmoothing. This simple yet powerful approach enables fast, scalable, and accurate exact GP inference with minimal upfront work. The flexibility of kernel design supports seamless application to both regression and classification tasks across diverse data modalities, including tabular inputs and structured domains such as images.


Optimal Learning Rates for Regularized Conditional Mean Embedding

Neural Information Processing Systems

We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of Y given X into a target reproducing kernel Hilbert space HY . The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between HX and L2, to HY . This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal O(logn/n) rates without assuming HY to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.


Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Neural Information Processing Systems

We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g.


Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Neural Information Processing Systems

We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g.


Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Neural Information Processing Systems

We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to eigenvalue decay of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g.






Fast Spectrum Estimation of Some Kernel Matrices

arXiv.org Machine Learning

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It is desirable to know the eigenvalue decay properties of these matrices without explicitly forming them, such as when determining if a low-rank approximation is feasible. In this work, we introduce a new eigenvalue quantile estimation framework for some kernel matrices. This framework gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix. The kernel matrices under consideration come from a kernel with quick decay away from the diagonal applied to uniformly-distributed sets of points in Euclidean space of any dimension. We prove the efficacy of this framework given certain bounds on the kernel function, and we provide empirical evidence for its accuracy. In the process, we also prove a very general interlacing-type theorem for finite sets of numbers. Additionally, we indicate an application of this framework to the study of the intrinsic dimension of data, as well as several other directions in which to generalize this work.