Existing explanation methods for black-box supervised learning models generally work by building local models that explain the models behaviour for a particular data item. It is possible to make global explanations, but the explanations may have low fidelity for complex models. Most of the prior work on explainable models has been focused on classification problems, with less attention on regression. We propose a new manifold visualization method, SLISEMAP, that at the same time finds local explanations for all of the data items and builds a two-dimensional visualization of model space such that the data items explained by the same model are projected nearby. We provide an open source implementation of our methods, implemented by using GPU-optimized PyTorch library. SLISEMAP works both on classification and regression models. We compare SLISEMAP to most popular dimensionality reduction methods and some local explanation methods. We provide mathematical derivation of our problem and show that SLISEMAP provides fast and stable visualizations that can be used to explain and understand black box regression and classification models.
Normalizing flows are diffeomorphic, typically dimension-preserving, models trained using the likelihood of the model. We use the SurVAE framework to construct dimension reducing surjective flows via a new layer, known as the funnel. We demonstrate its efficacy on a variety of datasets, and show it improves upon or matches the performance of existing flows while having a reduced latent space size. The funnel layer can be constructed from a wide range of transformations including restricted convolution and feed forward layers.
Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.
Dimensionality is the number of input variables or features for a dataset and dimensionality reduction is the process through which we reduce the number of input variables in a dataset. A lot of input features makes predictive modeling a more challenging task. When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the "essence" of the data. This is called dimensionality reduction. "dimensionality reduction yields a more compact, more easily interpretable representation of the target concept, focusing the user's attention on the most relevant variables."
An analysis of high dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Important variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions. This research proposes a new method that produces subspaces, reduced-dimensional physical spaces, based on a randomized search and forms an ensemble of models for critical subspaces. When applied to high-dimensional data collected from a composite metal development process, the proposed method shows its superiority in prediction and important variable selection.
In this paper, we investigate performing joint dimensionality reduction and classification using a novel histogram neural network. Motivated by a popular dimensionality reduction approach, t-Distributed Stochastic Neighbor Embedding (t-SNE), our proposed method incorporates a classification loss computed on samples in a low-dimensional embedding space. We compare the learned sample embeddings against coordinates found by t-SNE in terms of classification accuracy and qualitative assessment. We also explore use of various divergence measures in the t-SNE objective. The proposed method has several advantages such as readily embedding out-of-sample points and reducing feature dimensionality while retaining class discriminability. Our results show that the proposed approach maintains and/or improves classification performance and reveals characteristics of features produced by neural networks that may be helpful for other applications.
Recent work has shown the tremendous vulnerability to adversarial samples that are nearly indistinguishable from benign data but are improperly classified by the deep learning model. Some of the latest findings suggest the existence of adversarial attacks may be an inherent weakness of these models as a direct result of its sensitivity to well-generalizing features in high dimensional data. We hypothesize that data transformations can influence this vulnerability since a change in the data manifold directly determines the adversary's ability to create these adversarial samples. To approach this problem, we study the effect of dimensionality reduction through the lens of adversarial robustness. This study raises awareness of the positive and negative impacts of five commonly used data transformation techniques on adversarial robustness. The evaluation shows how these techniques contribute to an overall increased vulnerability where accuracy is only improved when the dimensionality reduction technique approaches the data's optimal intrinsic dimension. The conclusions drawn from this work contribute to understanding and creating more resistant learning models.
We propose a new concept of rateless auto-encoders (RL-AEs) that enable a flexible latent dimensionality, which can be seamlessly adjusted for varying distortion and dimensionality requirements. In the proposed RL-AEs, instead of a deterministic bottleneck architecture, we use an over-complete representation that is stochastically regularized with weighted dropouts, in a manner analogous to sparse AE (SAE). Unlike SAEs, our RL-AEs employ monotonically increasing dropout rates across the latent representation nodes such that the latent variables become sorted by importance like in principal component analysis (PCA). This is motivated by the rateless property of conventional PCA, where the least important principal components can be discarded to realize variable rate dimensionality reduction that gracefully degrades the distortion. In contrast, since the latent variables of conventional AEs are equally important for data reconstruction, they cannot be simply discarded to further reduce the dimensionality after the AE model is trained. Our proposed stochastic bottleneck framework enables seamless rate adaptation with high reconstruction performance, without requiring predetermined latent dimensionality at training. We experimentally demonstrate that the proposed RL-AEs can achieve variable dimensionality reduction while achieving low distortion compared to conventional AEs.
In this paper, we propose a dimensionality reduction method applied to tensor-structured data as a hidden layer (we call it TensorProjection Layer) in a convolutional neural network. Our proposed method transforms input tensors into ones with a smaller dimension by projection. The directions of projection are viewed as training parameters associated with our proposed layer and trained via a supervised learning criterion such as minimization of the cross-entropy loss function. We discuss the gradients of the loss function with respect to the parameters associated with our proposed layer. We also implement simple numerical experiments to evaluate the performance of the TensorProjection Layer.
This work develops an analytically solvable unsupervised learning scheme that extracts the most informative components for predicting future inputs, termed predictive principal component analysis (PredPCA). Our scheme can effectively remove unpredictable observation noise and globally minimize the test prediction error. Mathematical analyses demonstrate that, with sufficiently high-dimensional observations that are generated by a linear or nonlinear system, PredPCA can identify the optimal hidden state representation, true system parameters, and true hidden state dimensionality, with a global convergence guarantee. We demonstrate the performance of PredPCA by using sequential visual inputs comprising hand-digits, rotating 3D objects, and natural scenes. It reliably and accurately estimates distinct hidden states and predicts future outcomes of previously unseen test input data, even in the presence of considerable observation noise. The simple model structure and low computational cost of PredPCA make it highly desirable as a learning scheme for biological neural networks and neuromorphic chips. Prediction is essential for both biological organisms [1,2] and machine learning [3,4]. In particular, they need to predict the dynamics of newly encountered sensory input data (i.e., test data) based on and only on knowledge learned from a limited number of past experiences (i.e., training data). Generalization error is a standard measure of the generalization capability of predicting the future consequences of previously unseen input data, which is defined as the difference between the training and test prediction errors.