radial kernel
Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels
Maximum Mean Discrepancy (MMD) is a distance on the space of probability measures which has found numerous applications in machine learning and nonparametric testing. This distance is based on the notion of embedding probabilities in a reproducing kernel Hilbert space. In this paper, we present the first known lower bounds for the estimation of MMD based on finite samples. Our lower bounds hold for any radial universal kernel on \R d and match the existing upper bounds up to constants that depend only on the properties of the kernel. Using these lower bounds, we establish the minimax rate optimality of the empirical estimator and its U -statistic variant, which are usually employed in applications.
Reviews: Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels
The paper is very interesting as it provides the first known lower bounds for the minimax rate of estimation of the MMD distance between two distributions based on finite random samples, considering general radial kernels, which match with the known upper bounds up to constants. Theses bounds lead to a classical parametric rate of estimation, not depending on the dimension of the data. The text could however be more fluent, and easier to read for nonspecialists of the minimax estimation theory. For instance, the results are all expressed with respect to the minimax probabilities. Could the link with the more classical minimax risk be clearly written?
Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms
Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.
- Research Report (0.64)
- Instructional Material (0.46)
A Machine Learning Approach for Predicting Deterioration in Alzheimer's Disease
Musto, Henry, Stamate, Daniel, Pu, Ida, Stahl, Daniel
- This paper explores deterioration in In lieu of effective treatment for Alzheimer's disease [5], Alzheimer's Disease using Machine Learning. Subjects research has turned towards the possibility of early were split into two datasets based on baseline diagnosis detection of Alzheimer's biomarkers before the onset of (Cognitively Normal, Mild Cognitive Impairment), symptoms [6]. Work in this area has noted that accurate with outcome of deterioration at final visit (a binomial prediction of the risk of developing Alzheimer's disease essentially yes/no categorisation) using data from the and subsequent early interventions aimed at delaying the Alzheimer's Disease Neuroimaging Initiative onset of symptoms would ease the burden of suffering as (demographics, genetics, CSF, imaging, and well as financial costs of care for the patient and their neuropsychological testing etc). Indeed, research has suggested that the 100 iterations. We were able to demonstrate good structural brain changes that precipitate Alzheimer's predictive ability using CART predicting which of those symptoms may begin several years before the onset of in the cognitively normal group deteriorated and notable symptoms [6], and this may provide an opportunity received a worse diagnosis (AUC = 0.88).
- North America > United States (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
BP-DIP: A Backprojection based Deep Image Prior
Zukerman, Jenny, Tirer, Tom, Giryes, Raja
Deep neural networks are a very powerful tool for many computer vision tasks, including image restoration, exhibiting state-of-the-art results. However, the performance of deep learning methods tends to drop once the observation model used in training mismatches the one in test time. In addition, most deep learning methods require vast amounts of training data, which are not accessible in many applications. To mitigate these disadvantages, we propose to combine two image restoration approaches: (i) Deep Image Prior (DIP), which trains a convolutional neural network (CNN) from scratch in test time using the given degraded image. It does not require any training data and builds on the implicit prior imposed by the CNN architecture; and (ii) a backprojection (BP) fidelity term, which is an alternative to the standard least squares loss that is usually used in previous DIP works. We demonstrate the performance of the proposed method, termed BP-DIP, on the deblurring task and show its advantages over the plain DIP, with both higher PSNR values and better inference run-time.
Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels
Tolstikhin, Ilya O., Sriperumbudur, Bharath K., Schölkopf, Bernhard
Maximum Mean Discrepancy (MMD) is a distance on the space of probability measures which has found numerous applications in machine learning and nonparametric testing. This distance is based on the notion of embedding probabilities in a reproducing kernel Hilbert space. In this paper, we present the first known lower bounds for the estimation of MMD based on finite samples. Our lower bounds hold for any radial universal kernel on $\R d$ and match the existing upper bounds up to constants that depend only on the properties of the kernel. Using these lower bounds, we establish the minimax rate optimality of the empirical estimator and its $U$-statistic variant, which are usually employed in applications.
Approximation beats concentration? An approximation view on inference with smooth radial kernels
Positive definite kernels and their associated Reproducing Kernel Hilbert Spaces provide a mathematically compelling and practically competitive framework for learning from data. In this paper we take the approximation theory point of view to explore various aspects of smooth kernels related to their inferential properties. We analyze eigenvalue decay of kernels operators and matrices, properties of eigenfunctions/eigenvectors and "Fourier" coefficients of functions in the kernel space restricted to a discrete set of data points. We also investigate the fitting capacity of kernels, giving explicit bounds on the fat shattering dimension of the balls in Reproducing Kernel Hilbert spaces. Interestingly, the same properties that make kernels very effective approximators for functions in their "native" kernel space, also limit their capacity to represent arbitrary functions. We discuss various implications, including those for gradient descent type methods. It is important to note that most of our bounds are measure independent. Moreover, at least in moderate dimension, the bounds for eigenvalues are much tighter than the bounds which can be obtained from the usual matrix concentration results. For example, we see that the eigenvalues of kernel matrices show nearly exponential decay with constants depending only on the kernel and the domain. We call this "approximation beats concentration" phenomenon as even when the data are sampled from a probability distribution, some of their aspects are better understood in terms of approximation theory.
Sparse Approximation of a Kernel Mean
Kernel means are frequently used to represent probability distributions in machine learning problems. In particular, the well known kernel density estimator and the kernel mean embedding both have the form of a kernel mean. Unfortunately, kernel means are faced with scalability issues. A single point evaluation of the kernel density estimator, for example, requires a computation time linear in the training sample size. To address this challenge, we present a method to efficiently construct a sparse approximation of a kernel mean. We do so by first establishing an incoherence-based bound on the approximation error, and then noticing that, for the case of radial kernels, the bound can be minimized by solving the $k$-center problem. The outcome is a linear time construction of a sparse kernel mean, which also lends itself naturally to an automatic sparsity selection scheme. We show the computational gains of our method by looking at three problems involving kernel means: Euclidean embedding of distributions, class proportion estimation, and clustering using the mean-shift algorithm.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)