Goto

Collaborating Authors

 Narayanan, Hariharan


Structural Risk Minimization for $C^{1,1}(\mathbb{R}^d)$ Regression

arXiv.org Machine Learning

One means of fitting functions to high-dimensional data is by providing smoothness constraints. Recently, the following smooth function approximation problem was proposed by \citet*{herbert2014computing}: given a finite set $E \subset \mathbb{R}^d$ and a function $f: E \rightarrow \mathbb{R}$, interpolate the given information with a function $\widehat{f} \in \dot{C}^{1, 1}(\mathbb{R}^d)$ (the class of first-order differentiable functions with Lipschitz gradients) such that $\widehat{f}(a) = f(a)$ for all $a \in E$, and the value of $\mathrm{Lip}(\nabla \widehat{f})$ is minimal. An algorithm is provided that constructs such an approximating function $\widehat{f}$ and estimates the optimal Lipschitz constant $\mathrm{Lip}(\nabla \widehat{f})$ in the noiseless setting. We address statistical aspects of reconstructing the approximating function $\widehat{f}$ from a closely-related class $C^{1, 1}(\mathbb{R}^d)$ given samples from noisy data. We observe independent and identically distributed samples $y(a) = f(a) + \xi(a)$ for $a \in E$, where $\xi(a)$ is a noise term and the set $E \subset \mathbb{R}^d$ is fixed and known. We obtain uniform bounds relating the empirical risk and true risk over the class $\mathcal{F}_{\widetilde{M}} = \{f \in C^{1, 1}(\mathbb{R}^d) \mid \mathrm{Lip}(\nabla f) \leq \widetilde{M}\}$, where the quantity $\widetilde{M}$ grows with the number of samples at a rate governed by the metric entropy of the class $C^{1, 1}(\mathbb{R}^d)$. Finally, we provide an implementation using Vaidya's algorithm, supporting our results via numerical experiments on simulated data.


John's Walk

arXiv.org Machine Learning

We present an affine-invariant random walk for drawing uniform random samples from a convex body $\mathcal{K} \subset \mathbb{R}^n$ for which the maximum volume inscribed ellipsoid, known as John's ellipsoid, may be computed. We consider a polytope $\mathcal{P} = \{x \in \mathbb{R}^n \mid Ax \leq 1\}$ where $A \in \mathbb{R}^{m \times n}$ as a special case. Our algorithm makes steps using uniform sampling from the John's ellipsoid of the symmetrization of $\mathcal{K}$ at the current point. We show that from a warm start, the random walk mixes in $\widetilde{O}(n^7)$ steps where the log factors depend only on constants determined by the warm start and error parameters (and not on the dimension or number of constraints defining the body). This sampling algorithm thus offers improvement over the affine-invariant Dikin Walk for polytopes (which mixes in $\widetilde{O}(mn)$ steps from a warm start) for applications in which $m \gg n$. Furthermore, we describe an $\widetilde{O}(mn^{\omega+1} + n^{2\omega+2})$ algorithm for finding a suitably approximate John's ellipsoid for a symmetric polytope based on Vaidya's algorithm, and show the mixing time is retained using these approximate ellipsoids (where $\omega < 2.373$ is the current value of the fast matrix multiplication constant).


Manifold Learning Using Kernel Density Estimation and Local Principal Components Analysis

arXiv.org Machine Learning

We consider the problem of recovering a $d-$dimensional manifold $\mathcal{M} \subset \mathbb{R}^n$ when provided with noiseless samples from $\mathcal{M}$. There are many algorithms (e.g., Isomap) that are used in practice to fit manifolds and thus reduce the dimensionality of a given data set. Ideally, the estimate $\mathcal{M}_\mathrm{put}$ of $\mathcal{M}$ should be an actual manifold of a certain smoothness; furthermore, $\mathcal{M}_\mathrm{put}$ should be arbitrarily close to $\mathcal{M}$ in Hausdorff distance given a large enough sample. Generally speaking, existing manifold learning algorithms do not meet these criteria. Fefferman, Mitter, and Narayanan (2016) have developed an algorithm whose output is provably a manifold. The key idea is to define an approximate squared-distance function (asdf) to $\mathcal{M}$. Then, $\mathcal{M}_\mathrm{put}$ is given by the set of points where the gradient of the asdf is orthogonal to the subspace spanned by the largest $n - d$ eigenvectors of the Hessian of the asdf. As long as the asdf meets certain regularity conditions, $\mathcal{M}_\mathrm{put}$ is a manifold that is arbitrarily close in Hausdorff distance to $\mathcal{M}$. In this paper, we define two asdfs that can be calculated from the data and show that they meet the required regularity conditions. The first asdf is based on kernel density estimation, and the second is based on estimation of tangent spaces using local principal components analysis.


On Zeroth-Order Stochastic Convex Optimization via Random Walks

arXiv.org Machine Learning

We propose a method for zeroth order stochastic convex optimization that attains the suboptimality rate of $\tilde{\mathcal{O}}(n^{7}T^{-1/2})$ after $T$ queries for a convex bounded function $f:{\mathbb R}^n\to{\mathbb R}$. The method is based on a random walk (the \emph{Ball Walk}) on the epigraph of the function. The randomized approach circumvents the problem of gradient estimation, and appears to be less sensitive to noisy function evaluations compared to noiseless zeroth order methods.


Efficient Sampling from Time-Varying Log-Concave Distributions

arXiv.org Machine Learning

We propose a computationally efficient random walk on a convex body which rapidly mixes and closely tracks a time-varying log-concave distribution. We develop general theoretical guarantees on the required number of steps; this number can be calculated on the fly according to the distance from and the shape of the next distribution. We then illustrate the technique on several examples. Within the context of exponential families, the proposed method produces samples from a posterior distribution which is updated as data arrive in a streaming fashion. The sampling technique can be used to track time-varying truncated distributions, as well as to obtain samples from a changing mixture model, fitted in a streaming fashion to data. In the setting of linear optimization, the proposed method has oracle complexity with best known dependence on the dimension for certain geometries. In the context of online learning and repeated games, the algorithm is an efficient method for implementing no-regret mixture forecasting strategies. Remarkably, in some of these examples, only one step of the random walk is needed to track the next distribution.


Sample Complexity of Testing the Manifold Hypothesis

Neural Information Processing Systems

The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is {\it independent} of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume and curvature is unavoidable. Whether the known lower bound of $O(\frac{k}{\eps^2} + \frac{\log \frac{1}{\de}}{\eps^2})$ for the sample complexity of Empirical Risk minimization on $k-$means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 \cite{bart2}. Here $\eps$ is the desired bound on the error and $\de$ is a bound on the probability of failure. We improve the best currently known upper bound \cite{pontil} of $O(\frac{k^2}{\eps^2} + \frac{\log \frac{1}{\de}}{\eps^2})$ to $O\left(\frac{k}{\eps^2}\left(\min\left(k, \frac{\log^4 \frac{k}{\eps}}{\eps^2}\right)\right) + \frac{\log \frac{1}{\de}}{\eps^2}\right)$. Based on these results, we devise a simple algorithm for $k-$means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension.


Random Walk Approach to Regret Minimization

Neural Information Processing Systems

We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies.


On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Neural Information Processing Systems

One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary.