Goto

Collaborating Authors

 delta epsilon



Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

Neural Information Processing Systems

We revisit the classic randomized sketch of a tensor product of q vectors x_i\in\mathbb{R} n . However, in their analysis C_{\Omega} 2 can be as large as \Theta(n {2q}), even for a set \Omega of O(1) vectors x . We give a new analysis of this sketch, providing nearly optimal bounds. For the important case of q 2 and \delta 1/\poly(n), this shows that m \Theta(\epsilon {-2} \log(n) \epsilon {-1} \log 2(n)), demonstrating that the \epsilon {-2} and \log 2(n) terms do not multiply each other. In a number of applications, one has \Omega \poly(n) and in this case our bounds are optimal up to a constant factor.


Reviews: Clustering Stable Instances of Euclidean k-means.

Neural Information Processing Systems

The authors propose a notion of additive perturbation stability (APS) for Euclidean distances that maintain the optimal k-means clustering solution when each point in the data is moved by a sufficiently small Euclidean distance. I think the paper is rather interesting; however, the results of the paper are not very surprising. Here are my comments regarding the paper: (1) To my understanding, the results of Theorem 1.2 are only under the condition of APS. They only hold for the case of k 2 components and may lead to exponential dependence on k components for large k . However, under the additional margin condition between any two pairs of cluster, we will able to guarantee the existence of polynomial algorithm on k .


Reviews: Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems

This submission considers an online optimization problem, where the underlying objective function is either an expectation or a finite sum of functions. It is assumed that the objective is nonconvex, and the goal of the submission is to develop an efficient online algorithm (that is, a method with a complexity that is independent on the number of components defining the objective) that finds an approximate local minimum, i. e. a point at which the gradient norm is less than \epsilon and the minimum Hessian eigenvalue is at least -\delta, where \epsilon and \delta are two positive thresholds. The development of online algorithms is extremely relevant in the context of machine learning, in that it reduces the dependence of the method to the size of the available data. Considering nonconvex problems is also of critical importance, particularly given the outbreak of deep learning formulations. The submission appears to be written in order to help the reader follow the reasoning that lead to the derivation of the new method and results.


Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

Neural Information Processing Systems

We revisit the classic randomized sketch of a tensor product of $q$ vectors $x_i\in\mathbb{R} n$. However, in their analysis $C_{\Omega} 2$ can be as large as $\Theta(n {2q})$, even for a set $\Omega$ of $O(1)$ vectors $x$. We give a new analysis of this sketch, providing nearly optimal bounds. For the important case of $q 2$ and $\delta 1/\poly(n)$, this shows that $m \Theta(\epsilon {-2} \log(n) \epsilon {-1} \log 2(n))$, demonstrating that the $\epsilon {-2}$ and $\log 2(n)$ terms do not multiply each other. In a number of applications, one has $ \Omega \poly(n)$ and in this case our bounds are optimal up to a constant factor.