# Learning in High Dimensional Spaces

### Exploring the Curse of Dimensionality - Part II. - Dr. Juan Camilo Orduz

I continue exploring the curse of dimensionality. Following the analysis form Part I., I want to discuss another consequence of sparse sampling in high dimensions: sample points are close to an edge of the sample. This post is based on The Elements of Statistical Learning, Section 2.5, which I encourage to read! Consider $$N$$ data points uniformly distributed in a $$p$$-dimensional unit ball centered at the origin. Suppose we consider a nearest-neighbor estimate at the origin.

### Exploring the Curse of Dimensionality - Part I. - Dr. Juan Camilo Orduz

We will now investigate this curse. Let us prepare the notebook. Let $$\lambda: 0.1$$ represent the locality input parameter. Now let us write a function which verifies if a point $$x \in [0,1]$$ belongs to a given interval. Now we write a simulation function.

### Fourier analysis perspective for sufficient dimension reduction problem

A theory of sufficient dimension reduction (SDR) is developed from an optimizational perspective. In our formulation of the problem, instead of dealing with raw data, we assume that our ground truth includes a mapping ${\mathbf f}: {\mathbb R}^n\rightarrow {\mathbb R}^m$ and a probability distribution function $p$ over ${\mathbb R}^n$, both given analytically. We formulate SDR as a problem of finding a function ${\mathbf g}: {\mathbb R}^k\rightarrow {\mathbb R}^m$ and a matrix $P\in {\mathbb R}^{k\times n}$ such that ${\mathbb E}_{{\mathbf x}\sim p({\mathbf x})} \left|{\mathbf f}({\mathbf x}) - {\mathbf g}(P{\mathbf x})\right|^2$ is minimal. It turns out that the latter problem allows a reformulation in the dual space, i.e. instead of searching for ${\mathbf g}(P{\mathbf x})$ we suggest searching for its Fourier transform. First, we characterize all tempered distributions that can serve as the Fourier transform of such functions. The reformulation in the dual space can be interpreted as a problem of finding a $k$-dimensional linear subspace $S$ and a tempered distribution ${\mathbf t}$ supported in $S$ such that ${\mathbf t}$ is "close" in a certain sense to the Fourier transform of ${\mathbf f}$. Instead of optimizing over generalized functions with a $k$-dimensional support, we suggest minimizing over ordinary functions but with an additional term $R$ that penalizes a strong distortion of the support from any $k$-dimensional linear subspace. For a specific case of $R$, we develop an algorithm that can be formulated for functions given in the initial form as well as for their Fourier transforms. Eventually, we report results of numerical experiments with a discretized version of the latter algorithm.

### High-dimensional estimation via sum-of-squares proofs

Estimation is the computational task of recovering a hidden parameter $x$ associated with a distribution $D_x$, given a measurement $y$ sampled from the distribution. High dimensional estimation problems arise naturally in statistics, machine learning, and complexity theory. Many high dimensional estimation problems can be formulated as systems of polynomial equations and inequalities, and thus give rise to natural probability distributions over polynomial systems. Sum-of-squares proofs provide a powerful framework to reason about polynomial systems, and further there exist efficient algorithms to search for low-degree sum-of-squares proofs. Understanding and characterizing the power of sum-of-squares proofs for estimation problems has been a subject of intense study in recent years. On one hand, there is a growing body of work utilizing sum-of-squares proofs for recovering solutions to polynomial systems when the system is feasible. On the other hand, a general technique referred to as pseudocalibration has been developed towards showing lower bounds on the degree of sum-of-squares proofs. Finally, the existence of sum-of-squares refutations of a polynomial system has been shown to be intimately connected to the existence of spectral algorithms. In this article we survey these developments.

### Privately Learning High-Dimensional Distributions

We design nearly optimal differentially private algorithms for learning two fundamental families of high-dimensional distributions in total variation distance: multivariate Gaussians in $\mathbb{R}^{d}$ and product distributions on the hypercube. The sample complexity of both our algorithms approaches the sample complexity of non-private learners up to a small multiplicative factor and an additional additive term that is lower order for a wide range of parameters, showing that privacy comes essentially for free for these problems. Our algorithms use a novel technical approach to reducing the sensitivity of the estimation procedure that we call recursive private preconditioning and may find additional applications.

### Building Models for Biopathway Dynamics Using Intrinsic Dimensionality Analysis

An important task for many if not all the scientific domains is efficient knowledge integration, testing and codification. It is often solved with model construction in a controllable computational environment. In spite of that, the throughput of in-silico simulation-based observations become similarly intractable for thorough analysis. This is especially the case in molecular biology, which served as a subject for this study. In this project, we aimed to test some approaches developed to deal with the curse of dimensionality. Among these we found dimension reduction techniques especially appealing. They can be used to identify irrelevant variability and help to understand critical processes underlying high-dimensional datasets. Additionally, we subjected our data sets to nonlinear time series analysis, as those are well established methods for results comparison. To investigate the usefulness of dimension reduction methods, we decided to base our study on a concrete sample set. The example was taken from the domain of systems biology concerning dynamic evolution of sub-cellular signaling. Particularly, the dataset relates to the yeast pheromone pathway and is studied in-silico with a stochastic model. The model reconstructs signal propagation stimulated by a mating pheromone. In the paper, we elaborate on the reason of multidimensional analysis problem in the context of molecular signaling, and next, we introduce the model of choice, simulation details and obtained time series dynamics. A description of used methods followed by a discussion of results and their biological interpretation finalize the paper.

### Scalable Algorithms for Learning High-Dimensional Linear Mixed Models

Linear mixed models (LMMs) are used extensively to model dependecies of observations in linear regression and are used extensively in many application areas. Parameter estimation for LMMs can be computationally prohibitive on big data. State-of-the-art learning algorithms require computational complexity which depends at least linearly on the dimension $p$ of the covariates, and often use heuristics that do not offer theoretical guarantees. We present scalable algorithms for learning high-dimensional LMMs with sublinear computational complexity dependence on $p$. Key to our approach are novel dual estimators which use only kernel functions of the data, and fast computational techniques based on the subsampled randomized Hadamard transform. We provide theoretical guarantees for our learning algorithms, demonstrating the robustness of parameter estimation. Finally, we complement the theory with experiments on large synthetic and real data.

### Extreme Dimension Reduction for Handling Covariate Shift

In the covariate shift learning scenario, the training and test covariate distributions differ, so that a predictor's average loss over the training and test distributions also differ. In this work, we explore the potential of extreme dimension reduction, i.e. to very low dimensions, in improving the performance of importance weighting methods for handling covariate shift, which fail in high dimensions due to potentially high train/test covariate divergence and the inability to accurately estimate the requisite density ratios. We first formulate and solve a problem optimizing over linear subspaces a combination of their predictive utility and train/test divergence within. Applying it to simulated and real data, we show extreme dimension reduction helps sometimes but not always, due to a bias introduced by dimension reduction.

### [R] UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction • r/MachineLearning

Can I ask you a dumb question? I was thinking about dimensionality reduction the other day and an idea occurred to me: why not just use an autoencoder NN squeezing input data into d dimensions (d 2, 3, ...) and an appropriate loss function to mimic either PCA or t-SNE, or maybe even UMAP would work? This produces a scalable, incremental (approximate) algorithm that easily supports parallelisation. Besides being slower than a pure C/C implementation, do you see something wrong with it?

### UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP as described has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.