AITopics | Dimensionality Reduction

Collaborating Authors

Dimensionality Reduction

Dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection (find a subset of the original variables) and feature extraction (transform the data in the high-dimensional space to a space of fewer dimensions). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering

Oku, Makito

arXiv.org Machine LearningMay-6-2018

This paper proposes a new dimensionality reduction algorithm named branching embedding (BE). It converts a dendrogram to a two-dimensional scatter plot, and visualizes the inherent structures of the original high-dimensional data. Since the conversion part is not computationally demanding, the BE algorithm would be beneficial for the case where hierarchical clustering is already performed. Numerical experiments revealed that the outputs of the algorithm moderately preserve the original hierarchical structures.

artificial intelligence, dendrogram, machine learning, (14 more...)

arXiv.org Machine Learning

1805.02161

Country: Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback

Randomized ICA and LDA Dimensionality Reduction Methods for Hyperspectral Image Classification

Jayaprakash, Chippy, Damodaran, Bharath Bhushan, V, Sowmya, Soman, K P

arXiv.org Machine LearningApr-19-2018

Dimensionality reduction is an important step in processing the hyperspectral images (HSI) to overcome the curse of dimensionality problem. Linear dimensionality reduction methods such as Independent component analysis (ICA) and Linear discriminant analysis (LDA) are commonly employed to reduce the dimensionality of HSI. These methods fail to capture non-linear dependency in the HSI data, as data lies in the nonlinear manifold. To handle this, nonlinear transformation techniques based on kernel methods were introduced for dimensionality reduction of HSI. However, the kernel methods involve cubic computational complexity while computing the kernel matrix, and thus its potential cannot be explored when the number of pixels (samples) are large. In literature a fewer number of pixels are randomly selected to partial to overcome this issue, however this sub-optimal strategy might neglect important information in the HSI. In this paper, we propose randomized solutions to the ICA and LDA dimensionality reduction methods using Random Fourier features, and we label them as RFFICA and RFFLDA. Our proposed method overcomes the scalability issue and to handle the non-linearities present in the data more efficiently. Experiments conducted with two real-world hyperspectral datasets demonstrates that our proposed randomized methods outperform the conventional kernel ICA and kernel LDA in terms overall, per-class accuracies and computational time.

artificial intelligence, lda dimensionality reduction method, machine learning, (2 more...)

arXiv.org Machine Learning

1804.07347

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

Chen, Minshuo, Yang, Lin, Wang, Mengdi, Zhao, Tuo

arXiv.org Machine LearningMar-27-2018

Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja's algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the global convergence in terms of the continuous time limiting solution trajectory and further implies near optimal sample complexity. Numerical experiments are provided to support our analysis.

artificial intelligence, machine learning, stochastic nonconvex optimization, (2 more...)

arXiv.org Machine Learning

1803.02312

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.40)

Add feedback

On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding

Ting, Daniel, Jordan, Michael I.

arXiv.org Machine LearningMar-6-2018

We develop theory for nonlinear dimensionality reduction (NLDR). A number of NLDR methods have been developed, but there is limited understanding of how these methods work and the relationships between them. There is limited basis for using existing NLDR theory for deriving new algorithms. We provide a novel framework for analysis of NLDR via a connection to the statistical theory of linear smoothers. This allows us to both understand existing methods and derive new ones. We use this connection to smoothing to show that asymptotically, existing NLDR methods correspond to discrete approximations of the solutions of sets of differential equations given a boundary condition. In particular, we can characterize many existing methods in terms of just three limiting differential operators and boundary conditions. Our theory also provides a way to assert that one method is preferable to another; indeed, we show Local Tangent Space Alignment is superior within a class of methods that assume a global coordinate chart defines an isometric embedding of the manifold.

artificial intelligence, boundary condition, machine learning, (15 more...)

arXiv.org Machine Learning

1803.02432

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Dimensionality reduction for acoustic vehicle classification with spectral embedding

Sunu, Justin, Percus, Allon G.

arXiv.org Machine LearningFeb-17-2018

Classification and identification of moving vehicles from audio signals is of interest in many applications, ranging from traffic flow management to military target recognition. Classification may involve differentiating vehicles by type, such as jeep, sedan, etc. Identification can involve distinguishing specific vehicles, even within a given vehicle type. Since audio data is small compared to, say, video data, multiple audio sensors can be placed easily and inexpensively. However, there are certain obstacles having to do with both hardware and physics. Certain microphones and recording devices have built-in features, for example, damping/normalizing that may be applied when the recording exceeds a threshold.

artificial intelligence, machine learning, vehicle, (18 more...)

arXiv.org Machine Learning

1705.09869

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.87)
Transportation > Passenger (0.69)
Automobiles & Trucks > Manufacturer (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Dimensionality Reduction Algorithms: Strengths and Weaknesses

#artificialintelligenceFeb-4-2018, 05:23:29 GMT

Welcome to Part 2 of our tour through modern machine learning algorithms. In this part, we'll cover methods for Dimensionality Reduction, further broken into Feature Selection and Feature Extraction. In general, these tasks are rarely performed in isolation. Instead, they're often preprocessing steps to support other tasks. If you missed Part 1, you can check it out here.

artificial intelligence, data mining, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

Nonlinear Dimensionality Reduction on Graphs

Shen, Yanning, Traganitis, Panagiotis A., Giannakis, Georgios B.

arXiv.org Machine LearningJan-29-2018

In this era of data deluge, many signal processing and machine learning tasks are faced with high-dimensional datasets, including images, videos, as well as time series generated from social, commercial and brain network interactions. Their efficient processing calls for dimensionality reduction techniques capable of properly compressing the data while preserving task-related characteristics, going beyond pairwise data correlations. The present paper puts forth a nonlinear dimensionality reduction framework that accounts for data lying on known graphs. The novel framework turns out to encompass most of the existing dimensionality reduction methods as special cases, and it is capable of capturing and preserving possibly nonlinear correlations that are ignored by linear methods, as well as taking into account information from multiple graphs. An efficient algorithm admitting closed-form solution is developed and tested on synthetic datasets to corroborate its effectiveness.

data mining, machine learning, manifold, (17 more...)

arXiv.org Machine Learning

1801.0939

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.34)
Health & Medicine > Health Care Technology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Visualizing MNIST: An Exploration of Dimensionality Reduction - colah's blog

#artificialintelligenceJan-4-2018, 23:03:13 GMT

At some fundamental level, no one understands machine learning. It isn't a matter of things being too complicated. Almost everything we do is fundamentally very simple. Unfortunately, an innate human handicap interferes with us understanding these simple things. Humans evolved to reason fluidly about two and three dimensions. With some effort, we may think in four dimensions. Machine learning often demands we work with thousands of dimensions – or tens of thousands, or millions! Even very simple things become hard to understand when you do them in very high numbers of dimensions. Reasoning directly about these high dimensional spaces is just short of hopeless.

artificial intelligence, machine learning, visualization, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Dahlgaard, Søren, Knudsen, Mathias, Thorup, Mikkel

Neural Information Processing SystemsDec-31-2017

Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used, without concern for which concrete hash function is employed. The concrete hash function may work fine on sufficiently random input. The question is if it can be trusted in the real world when faced with more structured input. In this paper we focus on two prominent applications of hashing, namely similarity estimation with the one permutation hashing (OPH) scheme of Li et al. [NIPS'12] and feature hashing (FH) of Weinberger et al. [ICML'09], both of which have found numerous applications, i.e. in approximate near-neighbour search with LSH and large-scale classification with SVM. We consider the recent mixed tabulation hash function of Dahlgaard et al. [FOCS'15] which was proved theoretically to perform like a truly random hash function in many applications, including the above OPH. Here we first show improved concentration bounds for FH with truly random hashing and then argue that mixed tabulation performs similar when the input vectors are sparse. Our main contribution, however, is an experimental comparison of different hashing schemes when used inside FH, OPH, and LSH. We find that mixed tabulation hashing is almost as fast as the classic multiply-mod-prime scheme ax+b mod p. Mutiply-mod-prime is guaranteed to work well on sufficiently random data, but we demonstrate that in the above applications, it can lead to bias and poor concentration on both real-world and synthetic data. We also compare with the very popular MurmurHash3, which has no proven guarantees. Mixed tabulation and MurmurHash3 both perform similar to truly random hashing in our experiments. However, mixed tabulation was 40% faster than MurmurHash3, and it has the proven guarantee of good performance on all possible input making it more reliable.

data mining, hash function, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Dahlgaard, Søren, Knudsen, Mathias Bæk Tejs, Thorup, Mikkel

arXiv.org Machine LearningNov-23-2017

Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used, without concern for which concrete hash function is employed. The concrete hash function may work fine on sufficiently random input. The question is if it can be trusted in the real world when faced with more structured input. In this paper we focus on two prominent applications of hashing, namely similarity estimation with the one permutation hashing (OPH) scheme of Li et al. [NIPS'12] and feature hashing (FH) of Weinberger et al. [ICML'09], both of which have found numerous applications, i.e. in approximate near-neighbour search with LSH and large-scale classification with SVM. We consider mixed tabulation hashing of Dahlgaard et al.[FOCS'15] which was proved to perform like a truly random hash function in many applications, including OPH. Here we first show improved concentration bounds for FH with truly random hashing and then argue that mixed tabulation performs similar for sparse input. Our main contribution, however, is an experimental comparison of different hashing schemes when used inside FH, OPH, and LSH. We find that mixed tabulation hashing is almost as fast as the multiply-mod-prime scheme ax+b mod p. Mutiply-mod-prime is guaranteed to work well on sufficiently random data, but we demonstrate that in the above applications, it can lead to bias and poor concentration on both real-world and synthetic data. We also compare with the popular MurmurHash3, which has no proven guarantees. Mixed tabulation and MurmurHash3 both perform similar to truly random hashing in our experiments. However, mixed tabulation is 40% faster than MurmurHash3, and it has the proven guarantee of good performance on all possible input.

data mining, hash function, machine learning, (18 more...)

arXiv.org Machine Learning

1711.08797

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback