AITopics | Dimensionality Reduction

Collaborating Authors

Dimensionality Reduction

Dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection (find a subset of the original variables) and feature extraction (transform the data in the high-dimensional space to a space of fewer dimensions). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction

Shi, Zhenhua, Wu, Dongrui, Huang, Jian, Wang, Yu-Kai, Lin, Chin-Teng

arXiv.org Machine LearningJan-12-2020

Dimensionality reduction is an important operation in information visualization, feature extraction, clustering, regression, and classification, especially for processing noisy high dimensional data. However, most existing approaches preserve either the global or the local structure of the data, but not both. Approaches that preserve only the global data structure, such as principal component analysis (PCA), are usually sensitive to outliers. Approaches that preserve only the local data structure, such as locality preserving projections, are usually unsupervised (and hence cannot use label information) and uses a fixed similarity graph. We propose a novel linear dimensionality reduction approach, supervised discriminative sparse PCA with adaptive neighbors (SDSPCAAN), to integrate neighborhood-free supervised discriminative sparse PCA and projected clustering with adaptive neighbors. As a result, both global and local data structures, as well as the label information, are used for better dimensionality reduction. Classification experiments on nine high-dimensional datasets validated the effectiveness and robustness of our proposed SDSPCAAN.

dataset, matrix, sdspcaan, (9 more...)

arXiv.org Machine Learning

2001.03103

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Using Dimensionality Reduction to Optimize t-SNE

Shah, Rikhav, Silwal, Sandeep

arXiv.org Machine LearningDec-2-2019

t-SNE is a popular tool for embedding multi-dimensional datasets into two or three dimensions. However, it has a large computational cost, especially when the input data has many dimensions. Many use t-SNE to embed the output of a neural network, which is generally of much lower dimension than the original data. This limits the use of t-SNE in unsupervised scenarios. We propose using \textit{random} projections to embed high dimensional datasets into relatively few dimensions, and then using t-SNE to obtain a two dimensional embedding. We show that random projections preserve the desirable clustering achieved by t-SNE, while dramatically reducing the runtime of finding the embedding.

dimensionality reduction, implementation, random projection, (10 more...)

arXiv.org Machine Learning

1912.01098

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.47)

Add feedback

Topological Stability: a New Algorithm for Selecting The Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques

Elhenawy, Mohammed, Masoud, Mahmoud, Glaser, Sebastian, Rakotonirainy, Andry

arXiv.org Machine LearningNov-16-2019

In the machine learning field, dimensionality reduction is an important task. It mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new (non-linear) techniques for dimensionality reduction. Most of these techniques are based on the intuition that data lies on or near a complex low-dimensional manifold that is embedded in the high-dimensional space. New techniques for dimensionality reduction aim at identifying and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the nearest neighbours based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to choose the nearest neighbours to reduce the number of short-circuit errors and hence improves the topological stability. Because at any point on the manifold, that point and its nearest neighbours form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. The prposed algorithmuses the point itself and its two nearest neighbours to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. The proposed algorithm then adds new points to the two nearest neighbours based on the distance and the angle between each new point and the orthogonal to the subspace. The superior performance of the new algorithm in choosing the nearest neighbours is confirmed through experimental work with several datasets.

algorithm, isomap, subspace, (15 more...)

arXiv.org Machine Learning

doi: 10.13140/RG.2.2.16800.17922

1911.05312

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Graph Convolutional Networks Meet with High Dimensionality Reduction

Coskun, Mustafa

arXiv.org Machine LearningNov-7-2019

Recently, Graph Convolutional Networks (GCNs) and their variants have been receiving many research interests for learning graph-related tasks. While the GCNs have been successfully applied to this problem, some caveats inherited from classical deep learning still remain as open research topics in the context of the node classification problem. One such inherited caveat is that GCNs only consider the nodes that are a few propagations away from the labeled nodes to classify them. However, taking only a few propagation steps away nodes into account defeats the purpose of using the graph topological information in the GCNs. To remedy this problem, the-state-of-the-art methods leverage the network diffusion approaches, namely personalized page rank and its variants, to fully account for the graph topology, {\em after} they use the Neural Networks in the GCNs. However, these approaches overlook the fact that the network diffusion methods favour high degree nodes in the graph, resulting in the propagation of labels to unlabeled centralized, hub, nodes. To address this biasing hub nodes problem, in this paper, we propose to utilize a dimensionality reduction technique conjugate with personalized page rank so that we can both take advantage from graph topology and resolve the hub node favouring problem for GCNs. Here, our approach opens a new holistic road for message passing phase of GCNs by suggesting the usage of other proximity matrices instead of well-known Laplacian. Testing on two real-world networks that are commonly used in benchmarking GCNs' performance for the node classification context, we systematically evaluate the performance of the proposed methodology and show that our approach outperforms existing methods for wide ranges of parameter values with very limited deep learning training {\em epochs}.

gcn, matrix, node, (10 more...)

arXiv.org Machine Learning

1911.02928

Country:

North America > United States (0.04)
Asia > Middle East > Republic of Türkiye > Kayseri Province > Kayseri (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.60)

Add feedback

Dimensionality Reduction 101 for Dummies like Me

#artificialintelligenceOct-24-2019, 20:14:47 GMT

Let's starts with the WHY we need to perform Dimensionality Reduction before analyzing data and coming down to some inferences, it is often necessary to visualize the data set, in order to get an idea of it. But, nowadays data sets contain a lot of random variables (also called features) due to which it becomes difficult in visualizing the data set. Sometimes it is even impossible to visualize such high dimensional data as we humans fall astray after we reach a dimension higher than 3. Here is where we come across dimensionality reduction. The process of reducing the number of random variables of the data set under consideration, via obtaining a set of principal variables.

dimensionality reduction 101, information, manifold, (6 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.92)

Add feedback

A Multi-view Dimensionality Reduction Algorithm Based on Smooth Representation Model

Li, Haohao, Wang, Huibing

arXiv.org Machine LearningOct-12-2019

Over the past few decades, we have witnessed a large family of algorithms that have been designed to provide different solutions to the problem of dimensionality reduction (DR). The DR is an essential tool to excavate the important information from the high-dimensional data by mapping the data to a low-dimensional subspace. Furthermore, for the diversity of varied high-dimensional data, the multi-view features can be utilized for improving the learning performance. However, many DR methods fail to integrating multiple views. Although the features from different views are extracted by different manners, they are utilized to describe the same sample, which implies that they are highly related. Therefore, how to learn the subspace for high-dimensional features by utilizing the consistency and complementary properties of multi-view features is important in the present. In this paper, we propose an effective multi-view dimensionality reduction algorithm named Multi-view Smooth Preserve Projection. Firstly, we construct a single view DR method named Smooth Preserve Projection based on the Smooth Representation model. The proposed method aims to find a subspace for the high-dimensional data, in which the smooth reconstructive weights are preserved as much as possible. Then, we extend it to a multi-view version in which we exploits Hilbert-Schmidt Independence Criterion to jointly learn one common subspace for all views. A plenty of experiments on multi-view datasets show the excellent performance of the proposed method.

dataset, matrix, subspace, (14 more...)

arXiv.org Machine Learning

1910.04439

Country:

Asia > China > Liaoning Province > Dalian (0.05)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)

Add feedback

TriMap: Large-scale Dimensionality Reduction Using Triplets

Amid, Ehsan, Warmuth, Manfred K.

arXiv.org Machine LearningOct-1-2019

B M ORE V ISUALIZATIONS We compare the results of TriMap to LargeVis in Figure 7 and 8. We also provide more visualizations obtained using TriMap in Figure 9. C D ISCUSSION We briefly discuss the results of TriMap and draw a comparison to the other methods. TriMap generally provides better global accuracy compared to the competing methods. It also successfully maintains the continuity of the underlying manifold. This can be seen from the COIL-20 result where certain clusters are located farther away from the remaining clusters. However, the underlying structure for the main cluster resembles the one provided by the other methods. TriMap also preserves the continuous structure in the Fashion MNIST and the TV News datasets. TriMap is also efficient in uncovering the possible outliers in the data. For instance, PCA reveals a large number of outliers in the Tabula Muris and the 360 K Lyrics datasets.

dataset, trimap, triplet, (15 more...)

arXiv.org Machine Learning

1910.00204

Country: North America > United States > California > Santa Cruz County > Santa Cruz (0.14)

Genre: Research Report (0.64)

Industry: Media (0.35)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Interpretable Discriminative Dimensionality Reduction and Feature Selection on the Manifold

Hosseini, Babak, Hammer, Barbara

arXiv.org Machine LearningSep-19-2019

Dimensionality reduction (DR) on the manifold includes effective methods which project the data from an implicit relational space onto a vectorial space. Regardless of the achievements in this area, these algorithms suffer from the lack of interpretation of the projection dimensions. Therefore, it is often difficult to explain the physical meaning behind the embedding dimensions. In this research, we propose the interpretable kernel DR algorithm (I-KDR) as a new algorithm which maps the data from the feature space to a lower dimensional space where the classes are more condensed with less overlapping. Besides, the algorithm creates the dimensions upon local contributions of the data samples, which makes it easier to interpret them by class labels. Additionally, we efficiently fuse the DR with feature selection task to select the most relevant features of the original space to the discriminative objective. Based on the empirical evidence, I-KDR provides better interpretations for embedding dimensions as well as higher discriminative performance in the embedded space compared to the state-of-the-art and popular DR algorithms.

algorithm, dataset, dimension, (13 more...)

arXiv.org Machine Learning

1909.09218

Country:

Asia > Middle East > Jordan (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback

Laplacian Matrix for Dimensionality Reduction and Clustering

Wiskott, Laurenz, Schönfeld, Fabian

arXiv.org Machine LearningSep-18-2019

Many problems in machine learning can be expressed by means of a graph with nodes representing training samples and edges representing the relationship between samples in terms of similarity, temporal proximity, or label information. Graphs can in turn be represented by matrices. A special example is the Laplacian matrix, which allows us to assign each node a value that varies only little between strongly connected nodes and more between distant nodes. Such an assignment can be used to extract a useful feature representation, find a good embedding of data in a low dimensional space, or perform clustering on the original samples. In these lecture notes we first introduce the Laplacian matrix and then present a small number of algorithms designed around it.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1909.08381

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Comprehensive Guide to 12 Dimensionality Reduction Techniques

#artificialintelligenceSep-10-2019, 14:37:11 GMT

Have you ever worked on a dataset with more than a thousand features? I have, and let me tell you it's a very challenging task, especially if you don't know where to start! Having a high number of variables is both a boon and a curse. It's great that we have loads of data for analysis, but it is challenging due to size. It's not feasible to analyze each and every variable at a microscopic level. It might take us days or months to perform any meaningful analysis and we'll lose a ton of time and money for our business! Not to mention the amount of computational power this will take. We need a better way to deal with high dimensional data so that we can quickly extract patterns and insights from it. So how do we approach such a dataset?

artificial intelligence, data mining, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.44)

Add feedback