Learning in High Dimensional Spaces
Learning Reduced-Order Soft Robot Controller
Liang, Chen, Gao, Xifeng, Wu, Kui, Pan, Zherong
Deformable robots are notoriously difficult to model or control due to its high-dimensional configuration spaces. Direct trajectory optimization suffers from the curse-of-dimensionality and incurs a high computational cost, while learning-based controller optimization methods are sensitive to hyper-parameter tuning. To overcome these limitations, we hypothesize that high fidelity soft robots can be both simulated and controlled by restricting to low-dimensional spaces. Under such assumption, we propose a two-stage algorithm to identify such simulation- and control-spaces. Our method first identifies the so-called simulation-space that captures the salient deformation modes, to which the robot's governing equation is restricted. We then identify the control-space, to which control signals are restricted. We propose a multi-fidelity Riemannian Bayesian bilevel optimization to identify task-specific control spaces. We show that the dimension of control-space can be less than $10$ for a high-DOF soft robot to accomplish walking and swimming tasks, allowing low-dimensional MPC controllers to be applied to soft robots with tractable computational complexity.
From Spectral Theorem to Statistical Independence with Application to System Identification
Naeem, Muhammad Abdullah, Khazraei, Amir, Pajic, Miroslav
High dimensional random dynamical systems are ubiquitous, including -- but not limited to -- cyber-physical systems, daily return on different stocks of S&P 1500 and velocity profile of interacting particle systems around McKeanVlasov limit. Mathematically, underlying phenomenon can be captured via a stable $n$-dimensional linear transformation `$A$' and additive randomness. System identification aims at extracting useful information about underlying dynamical system, given a length $N$ trajectory from it (corresponds to an $n \times N$ dimensional data matrix). We use spectral theorem for non-Hermitian operators to show that spatio-temperal correlations are dictated by the discrepancy between algebraic and geometric multiplicity of distinct eigenvalues corresponding to state transition matrix. Small discrepancies imply that original trajectory essentially comprises of multiple lower dimensional random dynamical systems living on $A$ invariant subspaces and are statistically independent of each other. In the process, we provide first quantitative handle on decay rate of finite powers of state transition matrix $\|A^{k}\|$ . It is shown that when a stable dynamical system has only one distinct eigenvalue and discrepancy of $n-1$: $\|A\|$ has a dependence on $n$, resulting dynamics are spatially inseparable and consequently there exist at least one row with covariates of typical size $\Theta\big(\sqrt{N-n+1}$ $e^{n}\big)$ i.e., even under stability assumption, covariates can suffer from curse of dimensionality. In the light of these findings we set the stage for non-asymptotic error analysis in estimation of state transition matrix $A$ via least squares regression on observed trajectory by showing that element-wise error is essentially a variant of well-know Littlewood-Offord problem.
High-dimensional Bayesian Optimization with Group Testing
Hellsten, Erik Orm, Hvarfner, Carl, Papenmeier, Leonard, Nardi, Luigi
Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
A Probabilistic Graph Coupling View of Dimension Reduction
Van Assel, Hugues, Espinasse, Thibault, Chiquet, Julien, Picard, Franck
Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross-entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover, this reveals that these methods relying on shift-invariant kernels suffer from a statistical degeneracy that explains poor performances in conserving coarse-grain dependencies. New links are drawn with PCA which appears as a non-degenerate graph coupling model.
Randomized Dimension Reduction with Statistical Guarantees
Large models and enormous data are essential driving forces of the unprecedented successes achieved by modern algorithms, especially in scientific computing and machine learning. Nevertheless, the growing dimensionality and model complexity, as well as the non-negligible workload of data pre-processing, also bring formidable costs to such successes in both computation and data aggregation. As the deceleration of Moore's Law slackens the cost reduction of computation from the hardware level, fast heuristics for expensive classical routines and efficient algorithms for exploiting limited data are increasingly indispensable for pushing the limit of algorithm potency. This thesis explores some of such algorithms for fast execution and efficient data utilization. From the computational efficiency perspective, we design and analyze fast randomized low-rank decomposition algorithms for large matrices based on "matrix sketching", which can be regarded as a dimension reduction strategy in the data space. These include the randomized pivoting-based interpolative and CUR decomposition discussed in Chapter 2 and the randomized subspace approximations discussed in Chapter 3. From the sample efficiency perspective, we focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional robustness provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization where we view sample efficiency from the lens of dimension reduction in the function space. Then in Chapter 5, we introduce an adaptively weighted data augmentation consistency regularization algorithm for distributionally robust optimization with applications in medical image segmentation.
Generalized dimension reduction approach for heterogeneous networked systems with time-delay
Ma, Cheng, Korniss, Gyorgy, Szymanski, Boleslaw K., Gao, Jianxi
Networks of interconnected agents are essential to study complex networked systems' state evolution, stability, resilience, and control. Nevertheless, the high dimensionality and nonlinear dynamics are vital factors preventing us from theoretically analyzing them. Recently, the dimension-reduction approaches reduced the system's size by mapping the original system to a one-dimensional system such that only one effective representative can capture its macroscopic dynamics. However, the approaches dramatically fail as the network becomes heterogeneous and has multiple community structures. Here, we bridge the gap by developing a generalized dimension reduction approach, which enables us to map the original system to a $m$-dimensional system that consists of $m$ interacting components. Notably, by validating it on various dynamical models, this approach accurately predicts the original system state and the tipping point, if any. Furthermore, the numerical results demonstrate that this approach approximates the system evolution and identifies the critical points for complex networks with time delay.
Principal subbundles for dimension reduction
Akhรธj, Morten, Benn, James, Grong, Erlend, Sommer, Stefan, Pennec, Xavier
In this paper we demonstrate how sub-Riemannian geometry can be used for manifold learning and surface reconstruction by combining local linear approximations of a point cloud to obtain lower dimensional bundles. Local approximations obtained by local PCAs are collected into a rank $k$ tangent subbundle on $\mathbb{R}^d$, $k
The curse of dimensionality in operator learning
Lanthaler, Samuel, Stuart, Andrew M.
Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a curse of dimensionality, defined precisely here in terms of representations of the infinite-dimensional input and output function spaces. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO. The second contribution of the paper is to prove that the general curse of dimensionality can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of dimensionality related to the infinite-dimensional input and output function spaces.
Linearly-scalable learning of smooth low-dimensional patterns with permutation-aided entropic dimension reduction
Horenko, Illia, Pospisil, Lukas
The problem of efficiently re-arranging or permuting data in some desired way, for example, deploying sorting algorithms, belongs to the most long-standing questions in computer science and applied mathematics [1]. Many impressive theoretical and practical algorithmic results in this area could be established in past decades for problems of one-and low-dimensional data sorting, where one seeks for a monotonic (ascending or descending) ordering in one or several data dimensions [1-3]. In their seminal work, Garrett Birkhoff and John von Neumann have established a mathematical relationship between permutations of the T -component vector and the multiplication of this vector with T T double-stochastic Markovian operator P, containing only one 1.0 in every row and every column, with all other matrix elements being equal to zero [4, 5]. Such a'crisp' double-stochastic Markov operator (containing only zeroes and ones) is referred to as a permutation matrix - as opposed to the'fuzzy' doublestochastic Markov operators that contain elements that are between zero and one. For a vector with T elements there exist T! of all possible ('crisp') permutation matrices P, that build the edges of the T (T 2)-dimensional polytope of all'fuzzy' (or'soft') double-stochastic Markov matrices [6, 7]. NP-complexity of the original'crisp' permutation problem - arising when extending the generic sorting and permutation problems (satisfying desired criteria) from one to several dimensions - has led to a growing popularity of methods based on'soft' relaxations of the permutation matrix, ignited by several very successful mathematical and algorithmic approaches that dwell on the spectral decomposition of the'soft'/'fuzzy' Markov operator [8, 9] and allowing for a metastable decomposition and a reduced analysis of high-dimensional systems from various areas [10-12]. Further, the ideas of'soft' Markovian relaxation for permutations were explored in the areas of graph-matching and graph-alignment, leading to new approaches to these problems - like the new spectral criteria for checking the matching of this'fuzzy' /continuous relaxation to the original'crisp' graph-matching permutation [13]. These'soft' permutation ideas were further applied to the supervised graph-permutation and graph-alignment problems [14-16]. In the literature, it is argued that the'soft' permutations allow reducing NP-hard to P-hard algorithmic solutions, but at the same time is not clear how the loss of'crispness' for the resulting'soft' permutation matrices, can avoid leading to such'soft' permutation relaxation extremes as the stochastic matrices with all of the elements being equal to
A Normalized Bottleneck Distance on Persistence Diagrams and Homology Preservation under Dimension Reduction
Krishnamoorthy, Bala, May, Nathan H.
Persistence diagrams are used as signatures of point cloud data assumed to be sampled from manifolds, and represent their topology in a compact fashion. Further, two given clouds of points can be compared by directly comparing their persistence diagrams using the bottleneck distance, d_B. But one potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B values when there is a large degree of scaling between them. This situation is typical in dimension reduction frameworks that are also aiming to preserve topology. We define a new scale-invariant distance between persistence diagrams termed normalized bottleneck distance, d_N, and study its properties. In defining d_N, we also develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the associated bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X)) < e, where dgm(X) is the Vietoris-Rips persistence diagram of X, and 0 < e < 1 is the tolerance up to which pairwise distances are preserved by f. For mMDS, we present new bounds for both d_B and d_N between persistence diagrams of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X).