Goto

Collaborating Authors

 Dimensionality Reduction


A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction

Neural Information Processing Systems

Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.


Learning nonlinear level sets for dimensionality reduction in function approximation

Neural Information Processing Systems

We developed a Nonlinear Level-set Learning (NLL) method for dimensionality reduction in high-dimensional function approximation with small data. This work is motivated by a variety of design tasks in real-world engineering applications, where practitioners would replace their computationally intensive physical models (e.g., high-resolution fluid simulators) with fast-to-evaluate predictive machine learning models, so as to accelerate the engineering design processes. There are two major challenges in constructing such predictive models: (a) high-dimensional inputs (e.g., many independent design parameters) and (b) small training data, generated by running extremely time-consuming simulations. Thus, reducing the input dimension is critical to alleviate the over-fitting issue caused by data insufficiency. Existing methods, including sliced inverse regression and active subspace approaches, reduce the input dimension by learning a linear coordinate transformation; our main contribution is to extend the transformation approach to a nonlinear regime. Specifically, we exploit reversible networks (RevNets) to learn nonlinear level sets of a high-dimensional function and parameterize its level sets in low-dimensional spaces.


Multi-Criteria Dimensionality Reduction with Applications to Fairness

Neural Information Processing Systems

Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. [NeurIPS18] and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized.


RM4D: A Combined Reachability and Inverse Reachability Map for Common 6-/7-axis Robot Arms by Dimensionality Reduction to 4D

arXiv.org Artificial Intelligence

Knowledge of a manipulator's workspace is fundamental for a variety of tasks including robot design, grasp planning and robot base placement. Consequently, workspace representations are well studied in robotics. Two important representations are reachability maps and inverse reachability maps. The former predicts whether a given end-effector pose is reachable from where the robot currently is, and the latter suggests suitable base positions for a desired end-effector pose. Typically, the reachability map is built by discretizing the 6D space containing the robot's workspace and determining, for each cell, whether it is reachable or not. The reachability map is subsequently inverted to build the inverse map. This is a cumbersome process which restricts the applications of such maps. In this work, we exploit commonalities of existing six and seven axis robot arms to reduce the dimension of the discretization from 6D to 4D. We propose Reachability Map 4D (RM4D), a map that only requires a single 4D data structure for both forward and inverse queries. This gives a much more compact map that can be constructed by an order of magnitude faster than existing maps, with no inversion overheads and no loss in accuracy. Our experiments showcase the usefulness of RM4D for grasp planning with a mobile manipulator.


Model-based targeted dimensionality reduction for neuronal population data

Neural Information Processing Systems

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity. Recently developed methods use "targeted" approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. These methods have gained attention because they decompose total neural activity into what are ostensibly different parts of a neuronal computation. However, existing targeted methods have been developed outside of the confines of probabilistic modeling, making some aspects of the procedures ad hoc, or limited in flexibility or interpretability. Here we propose a new model-based method for targeted dimensionality reduction based on a probabilistic generative model of the population response data.


Reviews: Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

Neural Information Processing Systems

Summary: The paper consider the setting of streaming PCA for time series data which contains two challenging ingredients: data stream dependence and a non-convex optimization manifold. The authors address this setting via downsampled version of Oja's algorithm. By closely inspecting the optimization manifold and using tools from the theory of stochastic differential equations, the authors provide a rather detailed analysis of the convergence behavior, along with confirming experiments on synthetic and real data. Evaluation: Streaming PCA is a fundamental setting in a topic which becomes increasingly important for the ML community, namely, time series analysis. Both data dependence and non-convex optimization are still at their anecdotal preliminary stage, and the algorithm and the analysis provided in the paper form an interesting contribution in this respect.



Reviews: Model-based targeted dimensionality reduction for neuronal population data

Neural Information Processing Systems

Supervised dimensionality reduction has become a topic of interest in the systems neuroscience community over the last few years. Here, the authors suggested a very sensible extension to demixed PCA and targeted dimensionality reduction (TDR), which are recently developed but well-known and impactful methods in the field. However, I am disappointed that it heavily relies on simulated data rather than real biological datasets for its results. In particular, all datasets examined by the demixed PCA paper (in eLife) are freely available, so I feel that at least one of those datasets should have been analyzed for the purpose of comparison. I am not convinced that the proposed model would produce qualitatively different results from those already published. That being said, I think the proposed modeling framework is more straightforward than demixed PCA and offers the possibility of interesting future extensions.


Reviews: Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds

Neural Information Processing Systems

This paper investigates Dimensionality Reduction (DR) maps in an information retrieval setting. In particular, they showed that no DR map can attain both perfect precision and perfect recall. Further, they showed the theoretical bounds for the precision and the Wasserstein distance of a continuous DR map. They also run simulations in various settings. Quality: They have theoretical equivalences of precision and recall (Proposition 1) and show that perfect map does not exist (Theorem 1).


Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Neural Information Processing Systems

Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used, without concern for which concrete hash function is employed. The concrete hash function may work fine on sufficiently random input. The question is if they can be trusted in the real world where they may be faced with more structured input. In this paper we focus on two prominent applications of hashing, namely similarity estimation with the one permutation hashing (OPH) scheme of Li et al. [NIPS'12] and feature hashing (FH) of Weinberger et al. [ICML'09], both of which have found numerous applications, i.e. in approximate near-neighbour search with LSH and large-scale classification with SVM.