Learning in High Dimensional Spaces


Unsupervised Kernel Dimension Reduction

Neural Information Processing Systems

We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.


Feature-aware Label Space Dimension Reduction for Multi-label Classification

Neural Information Processing Systems

Label space dimension reduction (LSDR) is an efficient and effective paradigm for multi-label classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition.


Large-scale optimal transport map estimation using projection pursuit

Neural Information Processing Systems

This paper studies the estimation of large-scale optimal transport maps (OTM), which is a well known challenging problem owing to the curse of dimensionality. Existing literature approximates the large-scale OTM by a series of one-dimensional OTM problems through iterative random projection. Such methods, however, suffer from slow or none convergence in practice due to the nature of randomly selected projection directions. Instead, we propose an estimation method of large-scale OTM by combining the idea of projection pursuit regression and sufficient dimension reduction. The proposed method, named projection pursuit Monge map (PPMM), adaptively selects the most informative'' projection direction in each iteration.


The unreasonable effectiveness of small neural ensembles in high-dimensional brain

#artificialintelligence

Complexity is an indisputable, well-known, and broadly accepted feature of the brain. Despite the apparently obvious and widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular.


Supporting Multi-point Fan Design with Dimension Reduction

arXiv.org Machine Learning

Motivated by the idea of turbomachinery active subspace performance maps, this paper studies dimension reduction in turbomachinery 3D CFD simulations. First, we show that these subspaces exist across different blades---under the same parametrization---largely independent of their Mach number or Reynolds number. This is demonstrated via a numerical study on three different blades. Then, in an attempt to reduce the computational cost of identifying a suitable dimension reducing subspace, we examine statistical sufficient dimension reduction methods, including sliced inverse regression, sliced average variance estimation, principal Hessian directions and contour regression. Unsatisfied by these results, we evaluate a new idea based on polynomial variable projection---a non-linear least squares problem. Our results using polynomial variable projection clearly demonstrate that one can accurately identify dimension reducing subspaces for turbomachinery functionals at a fraction of the cost associated with prior methods. We apply these subspaces to the problem of comparing design configurations across different flight points on a working line of a fan blade. We demonstrate how designs that offer a healthy compromise between performance at cruise and sea-level conditions can be easily found by visually inspecting their subspaces.


A data-driven approach for multiscale elliptic PDEs with random coefficients based on intrinsic dimension reduction

arXiv.org Machine Learning

We propose a data-driven approach to solve multiscale elliptic PDEs with random coefficients based on the intrinsic low dimension structure of the underlying elliptic differential operators. Our method consists of offline and online stages. At the offline stage, a low dimension space and its basis are extracted from the data to achieve significant dimension reduction in the solution space. At the online stage, the extracted basis will be used to solve a new multiscale elliptic PDE efficiently. The existence of low dimension structure is established by showing the high separability of the underlying Green's functions. Different online construction methods are proposed depending on the problem setup. We provide error analysis based on the sampling error and the truncation threshold in building the data-driven basis. Finally, we present numerical examples to demonstrate the accuracy and efficiency of the proposed method.


Fast and Secure Distributed Learning in High Dimension

arXiv.org Machine Learning

Modern machine learning is distributed and the work of several machines is typically aggregated by \emph{averaging} which is the optimal rule in terms of speed, offering a speedup of $n$ (with respect to using a single machine) when $n$ processes are learning together. Distributing data and models poses however fundamental vulnerabilities, be they to software bugs, asynchrony, or worse, to malicious attackers controlling some machines or injecting misleading data in the network. Such behavior is best modeled as Byzantine failures, and averaging does not tolerate a single one from a worker. Krum, the first provably Byzantine resilient aggregation rule for SGD only uses one worker per step, which hampers its speed of convergence, especially in best case conditions when none of the workers is actually Byzantine. An idea, coined multi-Krum, of using $m$ different workers per step was mentioned, without however any proof neither on its Byzantine resilience nor on its slowdown. More recently, it was shown that in high dimensional machine learning, guaranteeing convergence is not a sufficient condition for \emph{strong} Byzantine resilience. A improvement on Krum, coined Bulyan, was proposed and proved to guarantee stronger resilience. However, Bulyan suffers from the same weakness of Krum: using only one worker per step. This adds up to the aforementioned open problem and leaves the crucial need for both fast and strong Byzantine resilience unfulfilled. The present paper proposes using Bulyan over Multi-Krum (we call it Multi-Bulyan), a combination for which we provide proofs of strong Byzantine resilience, as well as an ${\frac{m}{n}}$ slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning, finally we prove that Multi-Bulyan inherits the $O(d)$ merits of both multi-Krum and Bulyan.


Dimension reduction as an optimization problem over a set of generalized functions

arXiv.org Machine Learning

Classical dimension reduction problem can be loosely formulated as a problem of finding a $k$-dimensional affine subspace of ${\mathbb R}^n$ onto which data points ${\mathbf x}_1,\cdots, {\mathbf x}_N$ can be projected without loss of valuable information. We reformulate this problem in the language of tempered distributions, i.e. as a problem of approximating an empirical probability density function $p_{\rm{emp}}({\mathbf x}) = \frac{1}{N} \sum_{i=1}^N \delta^n (\bold{x} - \bold{x}_i)$, where $\delta^n$ is an $n$-dimensional Dirac delta function, by another tempered distribution $q({\mathbf x})$ whose density is supported in some $k$-dimensional subspace. Thus, our problem is reduced to the minimization of a certain loss function $I(q)$ measuring the distance from $q$ to $p_{\rm{emp}}$ over a pertinent set of generalized functions, denoted $\mathcal{G}_k$. Another classical problem of data analysis is the sufficient dimension reduction problem. We show that it can be reduced to the following problem: given a function $f: {\mathbb R}^n\rightarrow {\mathbb R}$ and a probability density function $p({\mathbf x})$, find a function of the form $g({\mathbf w}^T_1{\mathbf x}, \cdots, {\mathbf w}^T_k{\mathbf x})$ that minimizes the loss ${\mathbb E}_{{\mathbf x}\sim p} |f({\mathbf x})-g({\mathbf w}^T_1{\mathbf x}, \cdots, {\mathbf w}^T_k{\mathbf x})|^2$. We first show that search spaces of the latter two problems are in one-to-one correspondence which is defined by the Fourier transform. We introduce a nonnegative penalty function $R(f)$ and a set of ordinary functions $\Omega_\epsilon = \{f| R(f)\leq \epsilon\}$ in such a way that $\Omega_\epsilon$ `approximates' the space $\mathcal{G}_k$ when $\epsilon \rightarrow 0$. Then we present an algorithm for minimization of $I(f)+\lambda R(f)$, based on the idea of two-step iterative computation.


Exploring the Curse of Dimensionality - Part II. - Dr. Juan Camilo Orduz

#artificialintelligence

I continue exploring the curse of dimensionality. Following the analysis form Part I., I want to discuss another consequence of sparse sampling in high dimensions: sample points are close to an edge of the sample. This post is based on The Elements of Statistical Learning, Section 2.5, which I encourage to read! Consider \(N\) data points uniformly distributed in a \(p\)-dimensional unit ball centered at the origin. Suppose we consider a nearest-neighbor estimate at the origin.


Exploring the Curse of Dimensionality - Part I. - Dr. Juan Camilo Orduz

#artificialintelligence

We will now investigate this curse. Let us prepare the notebook. Let \(\lambda: 0.1\) represent the locality input parameter. Now let us write a function which verifies if a point \(x \in [0,1]\) belongs to a given interval. Now we write a simulation function.