# Dimensionality Reduction

### Multi-Criteria Dimensionality Reduction with Applications to Fairness

Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. [NeurIPS18] and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized.

### Learning nonlinear level sets for dimensionality reduction in function approximation

We developed a Nonlinear Level-set Learning (NLL) method for dimensionality reduction in high-dimensional function approximation with small data. This work is motivated by a variety of design tasks in real-world engineering applications, where practitioners would replace their computationally intensive physical models (e.g., high-resolution fluid simulators) with fast-to-evaluate predictive machine learning models, so as to accelerate the engineering design processes. There are two major challenges in constructing such predictive models: (a) high-dimensional inputs (e.g., many independent design parameters) and (b) small training data, generated by running extremely time-consuming simulations. Thus, reducing the input dimension is critical to alleviate the over-fitting issue caused by data insufficiency. Existing methods, including sliced inverse regression and active subspace approaches, reduce the input dimension by learning a linear coordinate transformation; our main contribution is to extend the transformation approach to a nonlinear regime.

### Dimensionality reduction: theoretical perspective on practical measures

Dimensionality reduction plays a central role in real-world applications for Machine Learning, among many fields. In particular, metric dimensionality reduction where data from a general metric is mapped into low dimensional space, is often used as a first step before applying machine learning algorithms. In almost all these applications the quality of the embedding is measured by various average case criteria. Metric dimensionality reduction has also been studied in Math and TCS, within the extremely fruitful and influential field of metric embedding. Yet, the vast majority of theoretical research has been devoted to analyzing the worst case behavior of embeddings and therefore has little relevance to practical settings.

### Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

However, in their analysis $C_{\Omega} 2$ can be as large as $\Theta(n {2q})$, even for a set $\Omega$ of $O(1)$ vectors $x$. We give a new analysis of this sketch, providing nearly optimal bounds. For the important case of $q 2$ and $\delta 1/\poly(n)$, this shows that $m \Theta(\epsilon {-2} \log(n) \epsilon {-1} \log 2(n))$, demonstrating that the $\epsilon {-2}$ and $\log 2(n)$ terms do not multiply each other. In a number of applications, one has $\Omega \poly(n)$ and in this case our bounds are optimal up to a constant factor. Lastly, we empirically compare our sketch to other sketches for tensor products, and give a novel application to compressing neural networks.

### Solving Interpretable Kernel Dimensionality Reduction

Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition.

### Diffeomorphic Dimensionality Reduction

This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets.

### Dimensionality Reduction for Data in Multiple Feature Representations

In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding.

### Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Outgoing from these observations we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both of these problems, in particular, if the data lies on or close to a low-dimensional submanifold in the feature space, the Hessian energy prefers functions which vary linearly with respect to the natural parameters in the data. This property makes it also particularly suited for the task of semi-supervised dimensionality reduction where the goal is to find the natural parameters in the data based on a few labeled points. The experimental result suggest that our method is superior to semi-supervised regression using Laplacian regularization and standard supervised methods and is particularly suited for semi-supervised dimensionality reduction. Papers published at the Neural Information Processing Systems Conference.

### DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

Probabilistic topic models (and their extensions) have become popular as models of latent structures in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood estimation, an approach which may be suboptimal in the context of an overall classification problem. In this paper, we describe DiscLDA, a discriminative learning framework for such models as Latent Dirichlet Allocation (LDA) in the setting of dimensionality reduction with supervised side information. In DiscLDA, a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood using Monte Carlo EM.

### Dimensionality Reduction Using the Sparse Linear Model

We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals. Papers published at the Neural Information Processing Systems Conference.