Collaborating Authors

Learning in High Dimensional Spaces

What is the relationship between Curse of Dimensionality and isotropic neighborhoods?


The problem that Hastie, Tibshirani and Friedman are talking about here is that the number of fixed-size neighborhoods goes up exponentially with the dimension. If you're trying to get some intuition for how isotropic neighborhoods are affected by the curse of dimensionality, think about approximating ball-shaped (isotropic) neighborhoods with cube-shaped neighborhoods. Suppose we have an $d$-dimensional unit cube $[0, 1] d$ that we want to divide up into cube-shaped neighborhoods. If I want a neighborhood of side length $\delta 0.1$, in one dimension this requires $10 1 10$ neighborhoods. In two dimensions, this requires $10 2 100$ neighborhoods.

Large-scale optimal transport map estimation using projection pursuit

Neural Information Processing Systems

This paper studies the estimation of large-scale optimal transport maps (OTM), which is a well known challenging problem owing to the curse of dimensionality. Existing literature approximates the large-scale OTM by a series of one-dimensional OTM problems through iterative random projection. Such methods, however, suffer from slow or none convergence in practice due to the nature of randomly selected projection directions. Instead, we propose an estimation method of large-scale OTM by combining the idea of projection pursuit regression and sufficient dimension reduction. The proposed method, named projection pursuit Monge map (PPMM), adaptively selects the most informative'' projection direction in each iteration.

Unsupervised Kernel Dimension Reduction

Neural Information Processing Systems

We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

Feature-aware Label Space Dimension Reduction for Multi-label Classification

Neural Information Processing Systems

Label space dimension reduction (LSDR) is an efficient and effective paradigm for multi-label classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition.

Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness Machine Learning

Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against $\ell_2$-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest $\ell_p$-radius that can be certified decreases as $O(1/d^{\frac{1}{2} - \frac{1}{p}})$ with dimension $d$ for $p > 2$. Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when $p \geq 2$. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an $\ell_1$ or an $\ell_\infty$-norm ball, we show upper bounds of the form $O(1 / d)$ and $O(1 / d^{1 - \frac{1}{p}})$ respectively, which have an even worse dependence on $d$.

Overcoming Mode Collapse and the Curse of Dimensionality


Machine Learning Lecture at CMU by Ke Li, Ph.D. Candidate at the University of California, Berkeley Lecturer: Ke Li Carnegie Mellon University Abstract: In this talk, Li presents his team's work on overcoming two long-standing problems in machine learning and algorithms: 1. Mode collapse in generative adversarial nets (GANs) Generative adversarial nets (GANs) are perhaps the most popular class of generative models in use today. Unfortunately, they suffer from the well-documented problem of mode collapse, which the many successive variants of GANs have failed to overcome. I will illustrate why mode collapse happens fundamentally and show a simple way to overcome it, which is the basis of a new method known as Implicit Maximum Likelihood Estimation (IMLE). It turns out that this problem is not insurmountable - I will explain how the curse of dimensionality arises and show a simple way to overcome it, which gives rise to a new family of algorithms known as Dynamic Continuous Indexing (DCI). Bio: Ke Li is a recent Ph.D. graduate from UC Berkeley, where he was advised by Prof. Jitendra Malik, and will join Google as a Research Scientist and the Institute for Advanced Study (IAS) as a Member hosted by Prof. Sanjeev Arora.

Projection pursuit with applications to scRNA sequencing data Machine Learning

In this paper, we explore the limitations of PCA as a dimension reduction technique and study its extension, projection pursuit (PP), which is a broad class of linear dimension reduction methods. PCA is a popular dimension reduction technique commonly applied to scRNA sequencing data. Despite of huge success in practice, we will illustrate three drawbacks of PCA. It is well known that the eigenvalues of sample covariance matrix is not consistent in high dimensional cases. Every principal component is uncorrelated with each other but not independent.

MM Algorithms for Distance Covariance based Sufficient Dimension Reduction and Sufficient Variable Selection Machine Learning

Sufficient dimension reduction (SDR) using distance covariance (DCOV) was recently proposed as an approach to dimension-reduction problems. Compared with other SDR methods, it is model-free without estimating link function and does not require any particular distributions on predictors (see Sheng and Yin, 2013, 2016). However, the DCOV-based SDR method involves optimizing a nonsmooth and nonconvex objective function over the Stiefel manifold. To tackle the numerical challenge, we novelly formulate the original objective function equivalently into a DC (Difference of Convex functions) program and construct an iterative algorithm based on the majorization-minimization (MM) principle. At each step of the MM algorithm, we inexactly solve the quadratic subproblem on the Stiefel manifold by taking one iteration of Riemannian Newton's method. The algorithm can also be readily extended to sufficient variable selection (SVS) using distance covariance. We establish the convergence property of the proposed algorithm under some regularity conditions. Simulation studies show our algorithm drastically improves the computation efficiency and is robust across various settings compared with the existing method. Supplemental materials for this article are available.

The unreasonable effectiveness of small neural ensembles in high-dimensional brain


Complexity is an indisputable, well-known, and broadly accepted feature of the brain. Despite the apparently obvious and widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular.

Supporting Multi-point Fan Design with Dimension Reduction Machine Learning

Motivated by the idea of turbomachinery active subspace performance maps, this paper studies dimension reduction in turbomachinery 3D CFD simulations. First, we show that these subspaces exist across different blades---under the same parametrization---largely independent of their Mach number or Reynolds number. This is demonstrated via a numerical study on three different blades. Then, in an attempt to reduce the computational cost of identifying a suitable dimension reducing subspace, we examine statistical sufficient dimension reduction methods, including sliced inverse regression, sliced average variance estimation, principal Hessian directions and contour regression. Unsatisfied by these results, we evaluate a new idea based on polynomial variable projection---a non-linear least squares problem. Our results using polynomial variable projection clearly demonstrate that one can accurately identify dimension reducing subspaces for turbomachinery functionals at a fraction of the cost associated with prior methods. We apply these subspaces to the problem of comparing design configurations across different flight points on a working line of a fan blade. We demonstrate how designs that offer a healthy compromise between performance at cruise and sea-level conditions can be easily found by visually inspecting their subspaces.