Goto

Collaborating Authors

Dimensionality Reduction


Dimensionality Reduction using an Autoencoder in Python

#artificialintelligence

Dimensionality is the number of input variables or features for a dataset and dimensionality reduction is the process through which we reduce the number of input variables in a dataset. A lot of input features makes predictive modeling a more challenging task. When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the "essence" of the data. This is called dimensionality reduction. "dimensionality reduction yields a more compact, more easily interpretable representation of the target concept, focusing the user's attention on the most relevant variables."


Guide To Dimensionality Reduction With Recursive Feature Elimination

#artificialintelligence

Therefore, feature elimination in statistics and machine learning is referred to as choosing a subset of relevant features from the dataset to use in further …


Techniques for Dimensionality Reduction

#artificialintelligence

In addition to this, the recent'Big Bang' in large datasets across companies, organisation, and government departments has resulted in a large uptake in data mining techniques. So, what is data mining? Simply put, it's the process of discovering trends and insights in high-dimensionality datasets (those with thousands of columns). On the one hand, the high-dimensionality datasets have enabled organisations to solve complex, real-world problems, such as reducing cancer patient waiting time, predicting protein structure associated with COVID-19, and analysing MEG brain imaging scans. However, on the other hand, large datasets can sometimes contain columns with poor-quality data, which can lower the performance of the model -- more isn't always better.


Shape-Preserving Dimensionality Reduction : An Algorithm and Measures of Topological Equivalence

arXiv.org Machine Learning

We introduce a linear dimensionality reduction technique preserving topological features via persistent homology. The method is designed to find linear projection $L$ which preserves the persistent diagram of a point cloud $\mathbb{X}$ via simulated annealing. The projection $L$ induces a set of canonical simplicial maps from the Rips (or \v{C}ech) filtration of $\mathbb{X}$ to that of $L\mathbb{X}$. In addition to the distance between persistent diagrams, the projection induces a map between filtrations, called filtration homomorphism. Using the filtration homomorphism, one can measure the difference between shapes of two filtrations directly comparing simplicial complexes with respect to quasi-isomorphism $\mu_{\operatorname{quasi-iso}}$ or strong homotopy equivalence $\mu_{\operatorname{equiv}}$. These $\mu_{\operatorname{quasi-iso}}$ and $\mu_{\operatorname{equiv}}$ measures how much portion of corresponding simplicial complexes is quasi-isomorphic or homotopy equivalence respectively. We validate the effectiveness of our framework with simple examples.


Quantifying the Conceptual Error in Dimensionality Reduction

arXiv.org Artificial Intelligence

Dimension reduction of data sets is a standard problem in the realm of machine learning and knowledge reasoning. They affect patterns in and dependencies on data dimensions and ultimately influence any decision-making processes. Therefore, a wide variety of reduction procedures are in use, each pursuing different objectives. A so far not considered criterion is the conceptual continuity of the reduction mapping, i.e., the preservation of the conceptual structure with respect to the original data set. Based on the notion scale-measure from formal concept analysis we present in this work a) the theoretical foundations to detect and quantify conceptual errors in data scalings; b) an experimental investigation of our approach on eleven data sets that were respectively treated with a variant of non-negative matrix factorization.


Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey

arXiv.org Machine Learning

This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace of data. Different optimization variants of Laplacian eigenmap and its out-of-sample extension are explained. Thereafter, we introduce the locality preserving projection and its kernel variant as linear special cases of Laplacian eigenmap. Versions of graph embedding are then explained which are generalized versions of Laplacian eigenmap and locality preserving projection. Finally, diffusion map is introduced which is a method based on Laplacian of data and random walks on the data graph.


A Subspace-based Approach for Dimensionality Reduction and Important Variable Selection

arXiv.org Machine Learning

An analysis of high dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Important variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions. This research proposes a new method that produces subspaces, reduced-dimensional physical spaces, based on a randomized search and forms an ensemble of models for critical subspaces. When applied to high-dimensional data collected from a composite metal development process, the proposed method shows its superiority in prediction and important variable selection.


Hierarchical Subspace Learning for Dimensionality Reduction to Improve Classification Accuracy in Large Data Sets

arXiv.org Machine Learning

Manifold learning is used for dimensionality reduction, with the goal of finding a projection subspace to increase and decrease the inter- and intraclass variances, respectively. However, a bottleneck for subspace learning methods often arises from the high dimensionality of datasets. In this paper, a hierarchical approach is proposed to scale subspace learning methods, with the goal of improving classification in large datasets by a range of 3% to 10%. Different combinations of methods are studied. We assess the proposed method on five publicly available large datasets, for different eigen-value based subspace learning methods such as linear discriminant analysis, principal component analysis, generalized discriminant analysis, and reconstruction independent component analysis. To further examine the effect of the proposed method on various classification methods, we fed the generated result to linear discriminant analysis, quadratic linear analysis, k-nearest neighbor, and random forest classifiers. The resulting classification accuracies are compared to show the effectiveness of the hierarchical approach, reporting results of an average of 5% increase in classification accuracy.


Understanding dimensionality reduction in machine learning models

#artificialintelligence

Machine learning algorithms have gained fame for being able to ferret out relevant information from datasets with many features, such as tables with dozens of rows and images with millions of pixels. Thanks to advances in cloud computing, you can often run very large machine learning models without noticing how much computational power works behind the scenes. But every new feature that you add to your problem adds to its complexity, making it harder to solve it with machine learning algorithms. Data scientists use dimensionality reduction, a set of techniques that remove excessive and irrelevant features from their machine learning models. Dimensionality reduction slashes the costs of machine learning and sometimes makes it possible to solve complicated problems with simpler models. Machine learning models map features to outcomes.


Machine learning: What is dimensionality reduction?

#artificialintelligence

Machine learning algorithms have gained fame for being able to ferret out relevant information from datasets with many features, such as tables with dozens of rows and images with millions of pixels. Thanks to advances in cloud computing, you can often run very large machine learning models without noticing how much computational power works behind the scenes. But every new feature that you add to your problem adds to its complexity, making it harder to solve it with machine learning algorithms. Data scientists use dimensionality reduction, a set of techniques that remove excessive and irrelevant features from their machine learning models. Dimensionality reduction slashes the costs of machine learning and sometimes makes it possible to solve complicated problems with simpler models.