Learning in High Dimensional Spaces
Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network
Sufficient dimension reduction is a powerful tool to extract core information hidden in the high-dimensional data and has potentially many important applications in machine learning tasks. However, the existing nonlinear sufficient dimension reduction methods often lack the scalability necessary for dealing with large-scale data. We propose a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for large-scale data. The proposed stochastic neural network is trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in the paper as well. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.
Large-scale optimal transport map estimation using projection pursuit
Cheng Meng, Yuan Ke, Jingyi Zhang, Mengrui Zhang, Wenxuan Zhong, Ping Ma
This paper studies the estimation of large-scale optimal transport maps (OTM), which is a well known challenging problem owing to the curse of dimensionality. Existing literature approximates the large-scale OTM by a series of one-dimensional OTM problems through iterative random projection. Such methods, however, suffer from slow or none convergence in practice due to the nature of randomly selected projection directions. Instead, we propose an estimation method of large-scale OTM by combining the idea of projection pursuit regression and sufficient dimension reduction. The proposed method, named projection pursuit Monge map (PPMM), adaptively selects the most "informative" projection direction in each iteration. We theoretically show the proposed dimension reduction method can consistently estimate the most "informative" projection direction in each iteration.
Breaking the curse of dimensionality in structured density estimation
We consider the problem of estimating a structured multivariate density, subject to Markov conditions implied by an undirected graph. In the worst case, without Markovian assumptions, this problem suffers from the curse of dimensionality. Our main result shows how the curse of dimensionality can be avoided or greatly alleviated under the Markov property, and applies to arbitrary graphs. While existing results along these lines focus on sparsity or manifold assumptions, we introduce a new graphical quantity called "graph resilience" and show how it controls the sample complexity. Surprisingly, although one might expect the sample complexity of this problem to scale with local graph parameters such as the degree, this turns out not to be the case. Through explicit examples, we compute uniform deviation bounds and illustrate how the curse of dimensionality in density estimation can thus be circumvented. Notable examples where the rate improves substantially include sequential, hierarchical, and spatial data.
A Probabilistic Graph Coupling View of Dimension Reduction
Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods relying on shift-invariant kernels su er from a statistical degeneracy that explains poor performances in conserving coarse-grain dependencies. New links are drawn with PCA which appears as a non-degenerate graph coupling model.
Supplementary Material to " Sufficient dimension reduction for classification using principal optimal transport direction "
X. Without loss of generality, we assume S(B) = S Hence, to prove Theorem 1, it is sufficient to show that S(B) = S(ฮฃ) holds. To verify S(B) = S(ฮฃ), we only need to show the following two results hold: (I). We now begin with the statement (I). This completes the proof for Statement I. We then turn to Statement II. This leads to a contradiction with (H.2) where the structure dimension is r.
Super-Bit Locality-Sensitive Hashing
Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian
Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, /2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.
Review for NeurIPS paper: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality
Summary and Contributions: ***** UPDATE ***** I realize I might have been harsh in my evaluation. I believe the paper would have been more suited for a more theory oriented statistics conference / journal, but this is a recurrent problem in NeurIPS and I shouldn't have taken it out on the authors. While their theoretical result is really interesting, I also didn't appreciate that the authors barely mentioned previous work on statistical learning bounds with optimal transport. There have been recent efforts on the topic by several teams, and they should at least acknowledge them. However, if other reviewers took the time to thoroughly review the proof of the main result, I'm willing to increase my score.
Review for NeurIPS paper: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality
Most of the reviewers were excited about this work, and I'm pleased to recommend it for publication. In the revision, please address all promised changes in the rebuttals and/or requested in the reviews. The outlier R1 has some valid points about the exposition as well as discomfort with the length of the appendix (it's true this is difficult to review in the NeurIPS environment), but these are not reasons to reject the work. That said, the authors of this paper are encouraged to take R1's expository suggestions seriously in their revision to make the work as approachable as possible.
A Probabilistic Graph Coupling View of Dimension Reduction Institut Camille Jordan Lyon 1 ENS Lyon
Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods relying on shift-invariant kernels su er from a statistical degeneracy that explains poor performances in conserving coarse-grain dependencies. New links are drawn with PCA which appears as a non-degenerate graph coupling model.