Dimensionality Reduction


Detecting Adversarial Examples through Nonlinear Dimensionality Reduction

arXiv.org Machine Learning

Deep neural networks are vulnerable to adversarial examples, i.e., carefully-perturbed inputs aimed to mislead classification. This work proposes a detection method based on combining non-linear dimensionality reduction and density estimation techniques. Our empirical findings show that the proposed approach is able to effectively detect adversarial examples crafted by non-adaptive attackers, i.e., not specifically tuned to bypass the detection method. Given our promising results, we plan to extend our analysis to adaptive attackers in future work.


Riemannian joint dimensionality reduction and dictionary learning on symmetric positive definite manifold

arXiv.org Machine Learning

Dictionary leaning (DL) and dimensionality reduction (DR) are powerful tools to analyze high-dimensional noisy signals. This paper presents a proposal of a novel Riemannian joint dimensionality reduction and dictionary learning (R-JDRDL) on symmetric positive definite (SPD) manifolds for classification tasks. The joint learning considers the interaction between dimensionality reduction and dictionary learning procedures by connecting them into a unified framework. We exploit a Riemannian optimization framework for solving DL and DR problems jointly. Finally, we demonstrate that the proposed R-JDRDL outperforms existing state-of-the-arts algorithms when used for image classification tasks.


Data Dimensionality Reduction in the Age of Machine Learning

#artificialintelligence

Machine Learning is all the rage as companies try to make sense of the mountains of data they are collecting. Data is everywhere and proliferating at unprecedented speed. But, more data is not always better. In fact, large amounts of data can not only considerably slow down the system execution but can sometimes even produce worse performances in Data Analytics applications. We have found, through years of formal and informal testing, that data dimensionality reduction -- or the process of reducing the number of attributes under consideration when running analytics -- is useful not only for speeding up algorithm execution but also for improving overall model performance.


Data Dimensionality Reduction in the Age of Machine Learning - DATAVERSITY

#artificialintelligence

Click to learn more about author Rosaria Silipo. Machine Learning is all the rage as companies try to make sense of the mountains of data they are collecting. Data is everywhere and proliferating at unprecedented speed. But, more data is not always better. In fact, large amounts of data can not only considerably slow down the system execution but can sometimes even produce worse performances in Data Analytics applications.


Model-based targeted dimensionality reduction for neuronal population data

Neural Information Processing Systems

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity. Recently developed methods use "targeted" approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. These methods have gained attention because they decompose total neural activity into what are ostensibly different parts of a neuronal computation. However, existing targeted methods have been developed outside of the confines of probabilistic modeling, making some aspects of the procedures ad hoc, or limited in flexibility or interpretability. Here we propose a new model-based method for targeted dimensionality reduction based on a probabilistic generative model of the population response data. The low-dimensional structure of our model is expressed as a low-rank factorization of a linear regression model. We perform efficient inference using a combination of expectation maximization and direct maximization of the marginal likelihood. We also develop an efficient method for estimating the dimensionality of each subspace. We show that our approach outperforms alternative methods in both mean squared error of the parameter estimates, and in identifying the correct dimensionality of encoding using simulated data. We also show that our method provides more accurate inference of low-dimensional subspaces of activity than a competing algorithm, demixed PCA.


Model-based targeted dimensionality reduction for neuronal population data

Neural Information Processing Systems

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity. Recently developed methods use "targeted" approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. These methods have gained attention because they decompose total neural activity into what are ostensibly different parts of a neuronal computation. However, existing targeted methods have been developed outside of the confines of probabilistic modeling, making some aspects of the procedures ad hoc, or limited in flexibility or interpretability. Here we propose a new model-based method for targeted dimensionality reduction based on a probabilistic generative model of the population response data. The low-dimensional structure of our model is expressed as a low-rank factorization of a linear regression model. We perform efficient inference using a combination of expectation maximization and direct maximization of the marginal likelihood. We also develop an efficient method for estimating the dimensionality of each subspace. We show that our approach outperforms alternative methods in both mean squared error of the parameter estimates, and in identifying the correct dimensionality of encoding using simulated data. We also show that our method provides more accurate inference of low-dimensional subspaces of activity than a competing algorithm, demixed PCA.


Deep Variational Sufficient Dimensionality Reduction

arXiv.org Machine Learning

We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the low-dimensional embedding of the data. We explain the SDR problem using graphical models and use the framework of variational autoencoders to maximize the lower bound of the log-likelihood of the joint distribution of the observation and label. We show that such a maximization problem can be interpreted as solving the SDR problem. DVSDR can be easily adopted to semi-supervised learning setting. In our experiment we show that DVSDR performs competitively on classification tasks while being able to generate novel data samples.


Optimal terminal dimensionality reduction in Euclidean space

arXiv.org Machine Learning

Let $\varepsilon\in(0,1)$ and $X\subset\mathbb R^d$ be arbitrary with $|X|$ having size $n>1$. The Johnson-Lindenstrauss lemma states there exists $f:X\rightarrow\mathbb R^m$ with $m = O(\varepsilon^{-2}\log n)$ such that $$ \forall x\in X\ \forall y\in X, \|x-y\|_2 \le \|f(x)-f(y)\|_2 \le (1+\varepsilon)\|x-y\|_2 . $$ We show that a strictly stronger version of this statement holds, answering one of the main open questions of [MMMR18]: "$\forall y\in X$" in the above statement may be replaced with "$\forall y\in\mathbb R^d$", so that $f$ not only preserves distances within $X$, but also distances to $X$ from the rest of space. Previously this stronger version was only known with the worse bound $m = O(\varepsilon^{-4}\log n)$. Our proof is via a tighter analysis of (a specific instantiation of) the embedding recipe of [MMMR18].


Dimensionality Reduction For Dummies -- Part 1: Intuition

#artificialintelligence

We need to see in order to believe. When you have a dataset with more than three dimensions, it becomes impossible to see what's going on with our eyes. But who said that these extra dimensions are really necessary? Isn't there a way to somehow reduce it to one, two, or three humanly dimensions? It turns out there is.


Dimensionality Reduction : Does PCA really improve classification outcome?

#artificialintelligence

I have come across a couple of resources about dimensionality reduction techniques. This topic is definitively one of the most interesting ones, and it is great to think that there are algorithms able to reduce the number of features by choosing the most important ones that still represent the entire dataset. One of the advantages pointed out by authors is that these algorithms can improve the results of a classification task. In this post, I am going to verify this statement using a Principal Component Analysis ( PCA) to try to improve the classification performance of a neural network over a dataset. Does PCA really improve classification outcome?