Learning in High Dimensional Spaces
Projection pursuit with applications to scRNA sequencing data
In this paper, we explore the limitations of PCA as a dimension reduction technique and study its extension, projection pursuit (PP), which is a broad class of linear dimension reduction methods. PCA is a popular dimension reduction technique commonly applied to scRNA sequencing data. Despite of huge success in practice, we will illustrate three drawbacks of PCA. It is well known that the eigenvalues of sample covariance matrix is not consistent in high dimensional cases. Every principal component is uncorrelated with each other but not independent.
MM Algorithms for Distance Covariance based Sufficient Dimension Reduction and Sufficient Variable Selection
Sufficient dimension reduction (SDR) using distance covariance (DCOV) was recently proposed as an approach to dimension-reduction problems. Compared with other SDR methods, it is model-free without estimating link function and does not require any particular distributions on predictors (see Sheng and Yin, 2013, 2016). However, the DCOV-based SDR method involves optimizing a nonsmooth and nonconvex objective function over the Stiefel manifold. To tackle the numerical challenge, we novelly formulate the original objective function equivalently into a DC (Difference of Convex functions) program and construct an iterative algorithm based on the majorization-minimization (MM) principle. At each step of the MM algorithm, we inexactly solve the quadratic subproblem on the Stiefel manifold by taking one iteration of Riemannian Newton's method. The algorithm can also be readily extended to sufficient variable selection (SVS) using distance covariance. We establish the convergence property of the proposed algorithm under some regularity conditions. Simulation studies show our algorithm drastically improves the computation efficiency and is robust across various settings compared with the existing method. Supplemental materials for this article are available.
The unreasonable effectiveness of small neural ensembles in high-dimensional brain
Complexity is an indisputable, well-known, and broadly accepted feature of the brain. Despite the apparently obvious and widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular.
The Similarity-Consensus Regularized Multi-view Learning for Dimension Reduction
Meng, Xiangzhu, Wang, Huibing, Feng, Lin
During the last decades, learning a low-dimensional space with discriminative information for dimension reduction (DR) has gained a surge of interest. However, it's not accessible for these DR methods to achieve satisfactory performance when facing the features from multiple views. In multi-view learning problems, one instance can be represented by multiple heterogeneous features, which are highly related but sometimes look different from each other. In addition, correlations between features from multiple views always vary greatly, which challenges the capability of multi-view learning methods. Consequently, constructing a multi-view learning framework with generalization and scalability, which could take advantage of multi-view information as much as possible, is extremely necessary but challenging. To implement the above target, this paper proposes a novel multi-view learning framework based on similarity consensus, which makes full use of correlations among multi-view features while considering the scalability and robustness of the framework. It aims to straightforwardly extend those existing DR methods into multi-view learning domain by preserving the similarity between different views to capture the low-dimensional embedding. Two schemes based on pairwise-consensus and centroid-consensus are separately proposed to force multiple views to learn from each other and then an iterative alternating strategy is developed to obtain the optimal solution. The proposed method is evaluated on 5 benchmark datasets and comprehensive experiments show that our proposed multi-view framework can yield comparable and promising performance with previous approaches proposed in recent literatures.
Supporting Multi-point Fan Design with Dimension Reduction
Seshadri, Pranay, Yuchi, Shaowu, Shahpar, Shahrokh, Parks, Geoffrey
Motivated by the idea of turbomachinery active subspace performance maps, this paper studies dimension reduction in turbomachinery 3D CFD simulations. First, we show that these subspaces exist across different blades---under the same parametrization---largely independent of their Mach number or Reynolds number. This is demonstrated via a numerical study on three different blades. Then, in an attempt to reduce the computational cost of identifying a suitable dimension reducing subspace, we examine statistical sufficient dimension reduction methods, including sliced inverse regression, sliced average variance estimation, principal Hessian directions and contour regression. Unsatisfied by these results, we evaluate a new idea based on polynomial variable projection---a non-linear least squares problem. Our results using polynomial variable projection clearly demonstrate that one can accurately identify dimension reducing subspaces for turbomachinery functionals at a fraction of the cost associated with prior methods. We apply these subspaces to the problem of comparing design configurations across different flight points on a working line of a fan blade. We demonstrate how designs that offer a healthy compromise between performance at cruise and sea-level conditions can be easily found by visually inspecting their subspaces.
A data-driven approach for multiscale elliptic PDEs with random coefficients based on intrinsic dimension reduction
Li, Sijing, Zhang, Zhiwen, Zhao, Hongkai
We propose a data-driven approach to solve multiscale elliptic PDEs with random coefficients based on the intrinsic low dimension structure of the underlying elliptic differential operators. Our method consists of offline and online stages. At the offline stage, a low dimension space and its basis are extracted from the data to achieve significant dimension reduction in the solution space. At the online stage, the extracted basis will be used to solve a new multiscale elliptic PDE efficiently. The existence of low dimension structure is established by showing the high separability of the underlying Green's functions. Different online construction methods are proposed depending on the problem setup. We provide error analysis based on the sampling error and the truncation threshold in building the data-driven basis. Finally, we present numerical examples to demonstrate the accuracy and efficiency of the proposed method.
Symphony of high-dimensional brain
Gorban, Alexander N., Makarov, Valeri A., Tyukin, Ivan Y.
This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and "extremely non-random" real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{\r{u}}rkov{\'a}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience.
Bayesian inverse regression for supervised dimension reduction with small datasets
Cai, Xin, Lin, Guang, Li, Jinglai
We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) class of methods, which is to use the statistical information of the conditional distribution $\pi(\-x|y)$ to identify the dimension reduction (DR) space and in particular we focus on the task of computing this conditional distribution. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $\pi(\-x|y)$ can then be obtained directly by assigning weights to the original data points. We then can perform DR by considering certain moment functions (e.g. the first moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems.
Multi-view Locality Low-rank Embedding for Dimension Reduction
Feng, Lin, Meng, Xiangzhu, Wang, Huibing
During the last decades, we have witnessed a surge of interests of learning a low-dimensional space with discriminative information from one single view. Even though most of them can achieve satisfactory performance in some certain situations, they fail to fully consider the information from multiple views which are highly relevant but sometimes look different from each other. Besides, correlations between features from multiple views always vary greatly, which challenges multi-view subspace learning. Therefore, how to learn an appropriate subspace which can maintain valuable information from multi-view features is of vital importance but challenging. To tackle this problem, this paper proposes a novel multi-view dimension reduction method named Multi-view Locality Low-rank Embedding for Dimension Reduction (MvL2E). MvL2E makes full use of correlations between multi-view features by adopting low-rank representations. Meanwhile, it aims to maintain the correlations and construct a suitable manifold space to capture the low-dimensional embedding for multi-view features. A centroid based scheme is designed to force multiple views to learn from each other. And an iterative alternating strategy is developed to obtain the optimal solution of MvL2E. The proposed method is evaluated on 5 benchmark datasets. Comprehensive experiments show that our proposed MvL2E can achieve comparable performance with previous approaches proposed in recent literatures.
Fast and Secure Distributed Learning in High Dimension
El-Mhamdi, El-Mahdi, Guerraoui, Rachid
Modern machine learning is distributed and the work of several machines is typically aggregated by \emph{averaging} which is the optimal rule in terms of speed, offering a speedup of $n$ (with respect to using a single machine) when $n$ processes are learning together. Distributing data and models poses however fundamental vulnerabilities, be they to software bugs, asynchrony, or worse, to malicious attackers controlling some machines or injecting misleading data in the network. Such behavior is best modeled as Byzantine failures, and averaging does not tolerate a single one from a worker. Krum, the first provably Byzantine resilient aggregation rule for SGD only uses one worker per step, which hampers its speed of convergence, especially in best case conditions when none of the workers is actually Byzantine. An idea, coined multi-Krum, of using $m$ different workers per step was mentioned, without however any proof neither on its Byzantine resilience nor on its slowdown. More recently, it was shown that in high dimensional machine learning, guaranteeing convergence is not a sufficient condition for \emph{strong} Byzantine resilience. A improvement on Krum, coined Bulyan, was proposed and proved to guarantee stronger resilience. However, Bulyan suffers from the same weakness of Krum: using only one worker per step. This adds up to the aforementioned open problem and leaves the crucial need for both fast and strong Byzantine resilience unfulfilled. The present paper proposes using Bulyan over Multi-Krum (we call it Multi-Bulyan), a combination for which we provide proofs of strong Byzantine resilience, as well as an ${\frac{m}{n}}$ slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning, finally we prove that Multi-Bulyan inherits the $O(d)$ merits of both multi-Krum and Bulyan.