Collaborating Authors

Dimensionality Reduction

Dimensionality Reduction for Machine Learning


What is High Demensional Data? How does it affect your Machine Learning models? Have you ever wondered why your model isn't meeting your expectations and you have tried hyper-tuning the parameters until the ends of the earth, with no improvements? Understanding your data and your model may be key. Underneath such an immense and complicated hood, you may be concerned that there are few to no ways of gaining more insight into your data, as well as your model.

Isomap Embedding -- An Awesome Approach to Non-linear Dimensionality Reduction


As you can see, Isomap is an Unsupervised Machine Learning technique aimed at Dimensionality Reduction. It differs from a few other techniques in the same category by using a non-linear approach to dimensionality reduction instead of linear mappings used by algorithms such as PCA. We will see how linear vs. non-linear approaches differ in the next section. Isomap is a technique that combines several different algorithms, enabling it to use a non-linear way to reduce dimensions while preserving local structures. Before we look at the example of Isomap and compare it to a linear method of Principal Components Analysis (PCA), let's list the high-level steps that Isomap performs: For our example, let's create a 3D object known as a Swiss roll.

Interactive Dimensionality Reduction for Comparative Analysis Machine Learning

Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.

Dimensionality Reduction using an Autoencoder in Python


Dimensionality is the number of input variables or features for a dataset and dimensionality reduction is the process through which we reduce the number of input variables in a dataset. A lot of input features makes predictive modeling a more challenging task. When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the "essence" of the data. This is called dimensionality reduction. "dimensionality reduction yields a more compact, more easily interpretable representation of the target concept, focusing the user's attention on the most relevant variables."

Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey Machine Learning

This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.

Guide To Dimensionality Reduction With Recursive Feature Elimination


Therefore, feature elimination in statistics and machine learning is referred to as choosing a subset of relevant features from the dataset to use in further …

Techniques for Dimensionality Reduction


In addition to this, the recent'Big Bang' in large datasets across companies, organisation, and government departments has resulted in a large uptake in data mining techniques. So, what is data mining? Simply put, it's the process of discovering trends and insights in high-dimensionality datasets (those with thousands of columns). On the one hand, the high-dimensionality datasets have enabled organisations to solve complex, real-world problems, such as reducing cancer patient waiting time, predicting protein structure associated with COVID-19, and analysing MEG brain imaging scans. However, on the other hand, large datasets can sometimes contain columns with poor-quality data, which can lower the performance of the model -- more isn't always better.

Shape-Preserving Dimensionality Reduction : An Algorithm and Measures of Topological Equivalence Machine Learning

We introduce a linear dimensionality reduction technique preserving topological features via persistent homology. The method is designed to find linear projection $L$ which preserves the persistent diagram of a point cloud $\mathbb{X}$ via simulated annealing. The projection $L$ induces a set of canonical simplicial maps from the Rips (or \v{C}ech) filtration of $\mathbb{X}$ to that of $L\mathbb{X}$. In addition to the distance between persistent diagrams, the projection induces a map between filtrations, called filtration homomorphism. Using the filtration homomorphism, one can measure the difference between shapes of two filtrations directly comparing simplicial complexes with respect to quasi-isomorphism $\mu_{\operatorname{quasi-iso}}$ or strong homotopy equivalence $\mu_{\operatorname{equiv}}$. These $\mu_{\operatorname{quasi-iso}}$ and $\mu_{\operatorname{equiv}}$ measures how much portion of corresponding simplicial complexes is quasi-isomorphic or homotopy equivalence respectively. We validate the effectiveness of our framework with simple examples.

Quantifying the Conceptual Error in Dimensionality Reduction Artificial Intelligence

Dimension reduction of data sets is a standard problem in the realm of machine learning and knowledge reasoning. They affect patterns in and dependencies on data dimensions and ultimately influence any decision-making processes. Therefore, a wide variety of reduction procedures are in use, each pursuing different objectives. A so far not considered criterion is the conceptual continuity of the reduction mapping, i.e., the preservation of the conceptual structure with respect to the original data set. Based on the notion scale-measure from formal concept analysis we present in this work a) the theoretical foundations to detect and quantify conceptual errors in data scalings; b) an experimental investigation of our approach on eleven data sets that were respectively treated with a variant of non-negative matrix factorization.

A Subspace-based Approach for Dimensionality Reduction and Important Variable Selection Machine Learning

An analysis of high dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Important variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions. This research proposes a new method that produces subspaces, reduced-dimensional physical spaces, based on a randomized search and forms an ensemble of models for critical subspaces. When applied to high-dimensional data collected from a composite metal development process, the proposed method shows its superiority in prediction and important variable selection.