Collaborating Authors

Dimensionality Reduction

Dimensionality reduction, regularization, and generalization in overparameterized regressions Machine Learning

Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a "double descent" behavior. This involves two features: (1) The risk (out-of-sample prediction error) can grow arbitrarily when the number of samples $n$ approaches the number of parameters $p$, and (2) the risk decreases with $p$ at $p>n$, sometimes achieving a lower value than the lowest risk at $p

Autoencoders for Dimensionality Reduction


In the previous post, we explained how we can reduce the dimensions by applying PCA and t-SNE and how we can apply Non-Negative Matrix Factorization for the same scope. In this post, we will provide a concrete example of how we can apply Autoeconders for Dimensionality Reduction. We will work with Python and TensorFlow 2.x. We will use the MNIST dataset of TensorFlow, where the images are 28 x 28 dimensions, in other words, if we flatten the dimensions, we are dealing with 784 dimensions. Our goal is to reduce the dimensions, from 784 to 2, by including as much information as possible.

Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions Machine Learning

In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.

Dimensionality Reduction in Machine Learning


Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data.

Feature Engineering and Dimensionality Reduction


Udemy course Feature Engineering and Dimensionality Reduction Feature Selection vs Dimensionality Reduction While both methods are used for reducing the number of features in a dataset, there is an important difference. Feature selection is simply selecting and excluding given features without changing them. Dimensionality reduction transforms features into a lower dimension NED New What you'll learn The importance of Feature Engineering and Dimensionality Reduction in Data Science. Practical explanation and live coding with Python. Description Artificial Intelligence (AI) is indispensable these days.

Applying Dimensionality Reduction with PCA to Cancer Data


Principal Component Analysis (PCA) is a powerful and well-established data transformation method that can be used for data visualization, dimensionality reduction, and possibly improved performance with supervised learning tasks. In this use case blog, we examine a dataset consisting of measurements of benign and malignant tumors which are computed from digital images of a fine needle aspirate of breast mass tissue. Specifically, these 30 variables describe specific characteristics of the cell nuclei present in the images, such as texture symmetry and radius. The first step in applying PCA to this process was to see if we can more easily visualize separation between the malignant and benign classes in two dimensions. To do this, we first divide our dataset into train and test sets and perform the PCA using only the training data.

6 Dimensionality Reduction Algorithms With Python


Dimensionality reduction is an unsupervised learning technique. Nevertheless, it can be used as a data transform pre-processing step for machine learning algorithms on classification and regression predictive modeling datasets with supervised learning algorithms. There are many dimensionality reduction algorithms to choose from and no single best algorithm for all cases. Instead, it is a good idea to explore a range of dimensionality reduction algorithms and different configurations for each algorithm. In this tutorial, you will discover how to fit and evaluate top dimensionality reduction algorithms in Python.

Linear Discriminant Analysis for Dimensionality Reduction in Python


Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data. Linear Discriminant Analysis, or LDA for short, is a predictive modeling algorithm for multi-class classification. It can also be used as a dimensionality reduction technique, providing a projection of a training dataset that best separates the examples by their assigned class. The ability to use Linear Discriminant Analysis for dimensionality reduction often surprises most practitioners.

Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction Machine Learning

We propose a new concept of rateless auto-encoders (RL-AEs) that enable a flexible latent dimensionality, which can be seamlessly adjusted for varying distortion and dimensionality requirements. In the proposed RL-AEs, instead of a deterministic bottleneck architecture, we use an over-complete representation that is stochastically regularized with weighted dropouts, in a manner analogous to sparse AE (SAE). Unlike SAEs, our RL-AEs employ monotonically increasing dropout rates across the latent representation nodes such that the latent variables become sorted by importance like in principal component analysis (PCA). This is motivated by the rateless property of conventional PCA, where the least important principal components can be discarded to realize variable rate dimensionality reduction that gracefully degrades the distortion. In contrast, since the latent variables of conventional AEs are equally important for data reconstruction, they cannot be simply discarded to further reduce the dimensionality after the AE model is trained. Our proposed stochastic bottleneck framework enables seamless rate adaptation with high reconstruction performance, without requiring predetermined latent dimensionality at training. We experimentally demonstrate that the proposed RL-AEs can achieve variable dimensionality reduction while achieving low distortion compared to conventional AEs.

Model-based targeted dimensionality reduction for neuronal population data

Neural Information Processing Systems

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity. Recently developed methods use "targeted" approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. These methods have gained attention because they decompose total neural activity into what are ostensibly different parts of a neuronal computation. However, existing targeted methods have been developed outside of the confines of probabilistic modeling, making some aspects of the procedures ad hoc, or limited in flexibility or interpretability. Here we propose a new model-based method for targeted dimensionality reduction based on a probabilistic generative model of the population response data.