Learning in High Dimensional Spaces
Large-scale optimal transport map estimation using projection pursuit
This paper studies the estimation of large-scale optimal transport maps (OTM), which is a well known challenging problem owing to the curse of dimensionality. Existing literature approximates the large-scale OTM by a series of one-dimensional OTM problems through iterative random projection. Such methods, however, suffer from slow or none convergence in practice due to the nature of randomly selected projection directions. Instead, we propose an estimation method of large-scale OTM by combining the idea of projection pursuit regression and sufficient dimension reduction. The proposed method, named projection pursuit Monge map (PPMM), adaptively selects the most informative'' projection direction in each iteration.
Supplement to " Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance " Fred Hutchinson Cancer Center
Unlike distance metric learning where the subsequent tasks utilizing the estimated distance metric is the usual focus, the proposal focuses on the estimated metric characterizing the geometry structure. Despite the illustrated taxi and MNIST examples, it is still open to finding more compelling applications that target the data space geometry. Interpreting mathematical concepts such as Riemannian metric and geodesic in the context of potential application (e.g., cognition and perception research where similarity measures are common) could be inspiring. Our proposal requires sufficiently dense data, which could be demanding, especially for high-dimensional data due to the curse of dimensionality. Dimensional reduction (e.g., manifold embedding as in the MNIST example) can substantially alleviate the curse of dimensionality, and the dense data requirement will more likely hold true.
Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality
We consider the problem of estimating the Wasserstein distance between the empirical measure and a set of probability measures whose expectations over a class of functions (hypothesis class) are constrained. If this class is sufficiently rich to characterize a particular distribution (e.g., all Lipschitz functions), then our formulation recovers the Wasserstein distance to such a distribution. We establish a strong duality result that generalizes the celebrated Kantorovich-Rubinstein duality. We also show that our formulation can be used to beat the curse of dimensionality, which is well known to affect the rates of statistical convergence of the empirical Wasserstein distance. In particular, examples of infinite-dimensional hypothesis classes are presented, informed by a complex correlation structure, for which it is shown that the empirical Wasserstein distance to such classes converges to zero at the standard parametric rate. Our formulation provides insights that help clarify why, despite the curse of dimensionality, the Wasserstein distance enjoys favorable empirical performance across a wide range of statistical applications.
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data.
979a3f14bae523dc5101c52120c535e9-AuthorFeedback.pdf
We thank the reviewers for the helpful feedback and the positive assessment of our submission. Reviewer #1, "It is interesting to see if further increase the width of the network (from linear in d to polynomial in d and In the setting of our paper (minimization of the total network size) a large depth is in some sense unavoidable (as e.g. However, in general there is of course some trade-off between width and depth. Assuming a sufficiently constrained family (e.g. a ball in the Barron space Reviewer #4, "Theorem 5.1 extends the approximation results to all piece-wise linear activation functions and not just So in theory, this should also apply to max-outs and other variants of ReLUs such as Leaky ReLUs?" That's right, all these functions are easily expressible one via another using just linear operations (ReLU(x) = Reviewer #4, "I fail to see some intuitions regarding the typical values of r, d, and H for the networks used in practice. T. Poggio et al., Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review.
Robust Agility via Learned Zero Dynamics Policies
Csomay-Shanklin, Noel, Compton, William D., Rodriguez, Ivan Dario Jimenez, Ambrose, Eric R., Yue, Yisong, Ames, Aaron D.
We study the design of robust and agile controllers for hybrid underactuated systems. Our approach breaks down the task of creating a stabilizing controller into: 1) learning a mapping that is invariant under optimal control, and 2) driving the actuated coordinates to the output of that mapping. This approach, termed Zero Dynamics Policies, exploits the structure of underactuation by restricting the inputs of the target mapping to the subset of degrees of freedom that cannot be directly actuated, thereby achieving significant dimension reduction. Furthermore, we retain the stability and constraint satisfaction of optimal control while reducing the online computational overhead. We prove that controllers of this type stabilize hybrid underactuated systems and experimentally validate our approach on the 3D hopping platform, ARCHER. Over the course of 3000 hops the proposed framework demonstrates robust agility, maintaining stable hopping while rejecting disturbances on rough terrain.