Dimensionality Reduction
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection
Van Assel, Hugues, Vincent-Cuaz, Cédric, Courty, Nicolas, Flamary, Rémi, Frossard, Pascal, Vayer, Titouan
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction methods to project data onto interpretable spaces or organizing points into meaningful clusters. In practice, these methods are used sequentially, without guaranteeing that the clustering aligns well with the conducted dimensionality reduction. In this work, we offer a fresh perspective: that of distributions. Leveraging tools from optimal transport, particularly the Gromov-Wasserstein distance, we unify clustering and dimensionality reduction into a single framework called distributional reduction. This allows us to jointly address clustering and dimensionality reduction with a single optimization problem. Through comprehensive experiments, we highlight the versatility and interpretability of our method and show that it outperforms existing approaches across a variety of image and genomics datasets.
Data efficiency, dimensionality reduction, and the generalized symmetric information bottleneck
Martini, K. Michael, Nemenman, Ilya
The Symmetric Information Bottleneck (SIB), an extension of the more familiar Information Bottleneck, is a dimensionality reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the Generalized Symmetric Information Bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the dataset size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that, in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.
Automatic dimensionality reduction of Twin-in-the-Loop Observers
Delcaro, Giacomo, Dettù, Federico, Formentin, Simone, Savaresi, Sergio Matteo
State-of-the-art vehicle dynamics estimation techniques usually share one common drawback: each variable to estimate is computed with an independent, simplified filtering module. These modules run in parallel and need to be calibrated separately. To solve this issue, a unified Twin-in-the-Loop (TiL) Observer architecture has recently been proposed: the classical simplified control-oriented vehicle model in the estimators is replaced by a full-fledged vehicle simulator, or digital twin (DT). The states of the DT are corrected in real time with a linear time invariant output error law. Since the simulator is a black-box, no explicit analytical formulation is available, hence classical filter tuning techniques cannot be used. Due to this reason, Bayesian Optimization will be used to solve a data-driven optimization problem to tune the filter. Due to the complexity of the DT, the optimization problem is high-dimensional. This paper aims to find a procedure to tune the high-complexity observer by lowering its dimensionality. In particular, in this work we will analyze both a supervised and an unsupervised learning approach. The strategies have been validated for speed and yaw-rate estimation on real-world data.
ANALYTiC: Understanding Decision Boundaries and Dimensionality Reduction in Machine Learning
The advent of compact, handheld devices has given us a pool of tracked movement data that could be used to infer trends and patterns that can be made to use. With this flooding of various trajectory data of animals, humans, vehicles, etc., the idea of ANALYTiC originated, using active learning to infer semantic annotations from the trajectories by learning from sets of labeled data. This study explores the application of dimensionality reduction and decision boundaries in combination with the already present active learning, highlighting patterns and clusters in data. We test these features with three different trajectory datasets with objective of exploiting the the already labeled data and enhance their interpretability. Our experimental analysis exemplifies the potential of these combined methodologies in improving the efficiency and accuracy of trajectory labeling. This study serves as a stepping-stone towards the broader integration of machine learning and visual methods in context of movement data analysis.
Information loss from dimensionality reduction in 5D-Gaussian spectral data
Understanding the loss of information in spectral analytics is a crucial first step towards finding root causes for failures and uncertainties using spectral data in artificial intelligence models built from modern complex data science applications. Here, we show from an elementary Shannon entropy model analysis with quantum statistics of Gaussian distributed spectral data, that the relative loss of information from dimensionality reduction due to the projection of an initial five-dimensional dataset onto two-dimensional diagrams is less than one percent in the parameter range of small data sets with sample sizes on the order of few hundred data samples. From our analysis, we also conclude that the density and expectation value of the entropy probability distribution increases with the sample number and sample size using artificial data models derived from random sampling Monte Carlo simulation methods.
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction
Guo, Yi, He, Yiqian, Li, Xiaoyang, Qin, Haotong, Pham, Van Tung, Zhang, Yang, Liu, Shouda
Knowledge Distillation (KD) emerges as one of the most promising compression technologies to run advanced deep neural networks on resource-limited devices. In order to train a small network (student) under the guidance of a large network (teacher), the intuitive method is regularizing the feature maps or logits of the student using the teacher's information. However, existing methods either over-restrict the student to learn all information from the teacher, which lead to some bad local minimum, or use various fancy and elaborate modules to process and align features, which are complex and lack generality. In this work, we proposed an abstract and general paradigm for the KD task, referred to as DIMensionality Reduction KD (RdimKD), which solely relies on dimensionality reduction, with a very minor modification to naive L2 loss. RdimKD straightforwardly utilizes a projection matrix to project both the teacher's and student's feature maps onto a low-dimensional subspace, which are then optimized during training. RdimKD achieves the goal in the simplest way that not only does the student get valuable information from the teacher, but it also ensures sufficient flexibility to adapt to the student's low-capacity reality. Our extensive empirical findings indicate the effectiveness of RdimKD across various learning tasks and diverse network architectures.
A Masked Pruning Approach for Dimensionality Reduction in Communication-Efficient Federated Learning Systems
Federated Learning (FL) represents a growing machine learning (ML) paradigm designed for training models across numerous nodes that retain local datasets, all without directly exchanging the underlying private data with the parameter server (PS). Its increasing popularity is attributed to notable advantages in terms of training deep neural network (DNN) models under privacy aspects and efficient utilization of communication resources. Unfortunately, DNNs suffer from high computational and communication costs, as well as memory consumption in intricate tasks. These factors restrict the applicability of FL algorithms in communication-constrained systems with limited hardware resources. In this paper, we develop a novel algorithm that overcomes these limitations by synergistically combining a pruning-based method with the FL process, resulting in low-dimensional representations of the model with minimal communication cost, dubbed Masked Pruning over FL (MPFL). The algorithm operates by initially distributing weights to the nodes through the PS. Subsequently, each node locally trains its model and computes pruning masks. These low-dimensional masks are then transmitted back to the PS, which generates a consensus pruning mask, broadcasted back to the nodes. This iterative process enhances the robustness and stability of the masked pruning model. The generated mask is used to train the FL model, achieving significant bandwidth savings. We present an extensive experimental study demonstrating the superior performance of MPFL compared to existing methods. Additionally, we have developed an open-source software package for the benefit of researchers and developers in related fields.
Generative Models for Anomaly Detection and Design-Space Dimensionality Reduction in Shape Optimization
Our work presents a novel approach to shape optimization, with the twofold objective to improve the efficiency of global optimization algorithms while promoting the generation of high-quality designs during the optimization process free of geometrical anomalies. This is accomplished by reducing the number of the original design variables defining a new reduced subspace where the geometrical variance is maximized and modeling the underlying generative process of the data via probabilistic linear latent variable models such as factor analysis and probabilistic principal component analysis. We show that the data follows approximately a Gaussian distribution when the shape modification method is linear and the design variables are sampled uniformly at random, due to the direct application of the central limit theorem. The degree of anomalousness is measured in terms of Mahalanobis distance, and the paper demonstrates that abnormal designs tend to exhibit a high value of this metric. This enables the definition of a new optimization model where anomalous geometries are penalized and consequently avoided during the optimization loop. The procedure is demonstrated for hull shape optimization of the DTMB 5415 model, extensively used as an international benchmark for shape optimization problems. The global optimization routine is carried out using Bayesian optimization and the DIRECT algorithm. From the numerical results, the new framework improves the convergence of global optimization algorithms, while only designs with high-quality geometrical features are generated through the optimization routine thereby avoiding the wastage of precious computationally expensive simulations.
Dimensionality Reduction and Wasserstein Stability for Kernel Regression
Eckstein, Stephan, Iske, Armin, Trabs, Mathias
In a high-dimensional regression framework, we study consequences of the naive two-step procedure where first the dimension of the input variables is reduced and second, the reduced input variables are used to predict the output variable with kernel regression. In order to analyze the resulting regression errors, a novel stability result for kernel regression with respect to the Wasserstein distance is derived. This allows us to bound errors that occur when perturbed input data is used to fit the regression function. We apply the general stability result to principal component analysis (PCA). Exploiting known estimates from the literature on both principal component analysis and kernel regression, we deduce convergence rates for the two-step procedure. The latter turns out to be particularly useful in a semi-supervised setting.
Dimensionality Reduction of Dynamics on Lie Manifolds via Structure-Aware Canonical Correlation Analysis
Chung, Wooyoung, Polani, Daniel, Tiomkin, Stas
Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems increase when applying group structure, converting them into Lie manifolds. The range of their applications is very wide and includes the important case of robotic tasks. Canonical Correlation Analysis (CCA) can construct a hierarchical sequence of maximal correlations of up to two paired data sets in these Euclidean spaces. We present a method to generalize this concept to Lie Manifolds and demonstrate its efficacy through the substantial improvements it achieves in making structure-consistent predictions about changes in the state of a robotic hand.