Collaborating Authors

Learning in High Dimensional Spaces

The Curse of Dimensionality; More is not always better!


Over the years in my journey as an Artificial Intelligence practitioner, I've observed that several concepts in Artificial Intelligence that we…

Geometric Priors I


In the last post on high-dimensional learning, we saw that learning in high dimensions is impossible without assumptions due to the curse of dimensionality, i.e., the number of samples required in our learning problem grows exponentially with dimensions. We also introduced the main geometric function spaces, in which our points in high-dimensional space can be considered as signals over the low-dimensional geometric domain. From this assumption, and to make learning tractable, I will present symmetry (in this post) and scale separation (in the next one). In addition, we also discussed the three kinds of errors we need to be aware of, namely, approximation error, statistical error, and optimization error. The approximation error increases if our function class decreases (the true function that we are trying to estimate is far outside of this class), which suggests having a large function class. In contrast, the statistical error implies we are unlikely to find the true function based on a finite number of data points. This error increases as the function class grows.

A selective review of sufficient dimension reduction for multivariate response regression Machine Learning

We review sufficient dimension reduction (SDR) estimators with multivariate response in this paper. A wide range of SDR methods are characterized as inverse regression SDR estimators or forward regression SDR estimators. The inverse regression family include pooled marginal estimators, projective resampling estimators, and distance-based estimators. Ordinary least squares, partial least squares, and semiparametric SDR estimators, on the other hand, are discussed as estimators from the forward regression family.

Invariance principle of random projection for the norm Machine Learning

Due to the internet boom and computer technology advancement in the last few decades, data collection and storage have been growing exponentially. With'gold' mining demand on the enormous amount of data reaches to a new level, we are facing many technical challenges in understanding the information we have collected. In many different cases, including text and images, data can be represented as points or vectors in high dimensional space. On one hand, it is very easy to collect more and more information about the object so that the dimensionality grows quickly. On the other hand it is very difficult to analyze and create useful models for high dimensional data due to several reasons including computational difficulty as a result of curse of dimensionality and high noise to signal ratio. It is therefore necessary to reduce the dimensionality of the data while preserving the relevant structures. The celebrated Johnson-Lindenstrauss lemma [6] states that random projections can be used as a general dimension reduction technique to embed topological structures in high dimensional Euclidean space into a low dimensional space without distorting its topology. Let us first recall the Johnson-Lindenstrauss lemma [4].

Curse of Dimensionality


"Data is the new oil"- a poetic phrase coined by British mathematician Clive Humbly seems a lot more relevant in today's world. As the world is progressing, our capability to generate and store data has become a lot easier. We are completely surrounded by data.

Sufficient Dimension Reduction for High-Dimensional Regression and Low-Dimensional Embedding: Tutorial and Survey Machine Learning

This is a tutorial and survey paper on various methods for Sufficient Dimension Reduction (SDR). We cover these methods with both statistical high-dimensional regression perspective and machine learning approach for dimensionality reduction. We start with introducing inverse regression methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), contour regression, directional regression, Principal Fitted Components (PFC), Likelihood Acquired Direction (LAD), and graphical regression. Then, we introduce forward regression methods including Principal Hessian Directions (pHd), Minimum Average Variance Estimation (MAVE), Conditional Variance Estimation (CVE), and deep SDR methods. Finally, we explain Kernel Dimension Reduction (KDR) both for supervised and unsupervised learning. We also show that supervised KDR and supervised PCA are equivalent.

Persuasion by Dimension Reduction Machine Learning

How should an agent (the sender) observing multi-dimensional data (the state vector) persuade another agent to take the desired action? We show that it is always optimal for the sender to perform a (non-linear) dimension reduction by projecting the state vector onto a lower-dimensional object that we call the "optimal information manifold." We characterize geometric properties of this manifold and link them to the sender's preferences. Optimal policy splits information into "good" and "bad" components. When the sender's marginal utility is linear, revealing the full magnitude of good information is always optimal. In contrast, with concave marginal utility, optimal information design conceals the extreme realizations of good information and only reveals its direction (sign). We illustrate these effects by explicitly solving several multi-dimensional Bayesian persuasion problems.

Dimension Reduction and Data Visualization for Fr\'echet Regression Machine Learning

With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fr\'echet regression model (Peterson & M\"uller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method for Fr\'echet regression to achieve two purposes: to mitigate the curse of dimensionality caused by high-dimensional predictors, and to provide a tool for data visualization for Fr\'echet regression. Our approach is flexible enough to turn any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. The basic idea is to first map the metric-space valued random object $Y$ to a real-valued random variable $f(Y)$ using a class of functions, and then perform classical SDR to the transformed data. If the class of functions is sufficiently rich, then we are guaranteed to uncover the Fr\'echet SDR space. We showed that such a class, which we call an ensemble, can be generated by a universal kernel. We established the consistency and asymptotic convergence rate of the proposed methods. The finite-sample performance of the proposed methods is illustrated through simulation studies for several commonly encountered metric spaces that include Wasserstein space, the space of symmetric positive definite matrices, and the sphere. We illustrated the data visualization aspect of our method by exploring the human mortality distribution data across countries and by studying the distribution of hematoma density.

Dimension Reduction for Data with Heterogeneous Missingness Machine Learning

Dimension reduction plays a pivotal role in analysing high-dimensional data. However, observations with missing values present serious difficulties in directly applying standard dimension reduction techniques. As a large number of dimension reduction approaches are based on the Gram matrix, we first investigate the effects of missingness on dimension reduction by studying the statistical properties of the Gram matrix with or without missingness, and then we present a bias-corrected Gram matrix with nice statistical properties under heterogeneous missingness. Extensive empirical results, on both simulated and publicly available real datasets, show that the proposed unbiased Gram matrix can significantly improve a broad spectrum of representative dimension reduction approaches.

Supervised Linear Dimension-Reduction Methods: Review, Extensions, and Comparisons Machine Learning

Principal component analysis (PCA) is a well-known linear dimension-reduction method that has been widely used in data analysis and modeling. It is an unsupervised learning technique that identifies a suitable linear subspace for the input variable that contains maximal variation and preserves as much information as possible. PCA has also been used in prediction models where the original, high-dimensional space of predictors is reduced to a smaller, more manageable, set before conducting regression analysis. However, this approach does not incorporate information in the response during the dimension-reduction stage and hence can have poor predictive performance. To address this concern, several supervised linear dimension-reduction techniques have been proposed in the literature. This paper reviews selected techniques, extends some of them, and compares their performance through simulations. Two of these techniques, partial least squares (PLS) and least-squares PCA (LSPCA), consistently outperform the others in this study.