Goto

Collaborating Authors

 Learning in High Dimensional Spaces


Universality laws for randomized dimension reduction, with applications

arXiv.org Machine Learning

Dimension reduction is the process of embedding high-dimensional data into a lower dimensional space to facilitate its analysis. In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data. This dimension reduction procedure succeeds when it preserves certain geometric features of the set. The question is how large the embedding dimension must be to ensure that randomized dimension reduction succeeds with high probability. This paper studies a natural family of randomized dimension reduction maps and a large class of data sets. It proves that there is a phase transition in the success probability of the dimension reduction map as the embedding dimension increases. For a given data set, the location of the phase transition is the same for all maps in this family. Furthermore, each map has the same stability properties, as quantified through the restricted minimum singular value. These results can be viewed as new universality laws in high-dimensional stochastic geometry. Universality laws for randomized dimension reduction have many applications in applied mathematics, signal processing, and statistics. They yield design principles for numerical linear algebra algorithms, for compressed sensing measurement ensembles, and for random linear codes. Furthermore, these results have implications for the performance of statistical estimation methods under a large class of random experimental designs.


Hyperspheres & the curse of dimensionality

#artificialintelligence

I previously talked about the curse of dimensionality (more than 2 years ago) related to Machine Learning. Here I wanted to discuss it in more depth and dive into the mathematics of it. High dimensions might sound like Physics' string theory where our universe is made of more than 4 dimensions. This isn't what we are talking about here. The curse of dimensionality is related to what happens when a model deals with a data space with dimensions in the hundreds or thousands.


Must-Know: What is the curse of dimensionality?

#artificialintelligence

Editor's note: This post was originally included as an answer to a question posed in our 17 More Must-Know Data Science Interview Questions and Answers series earlier this year. The answer was thorough enough that it was deemed to deserve its own dedicated post. "As the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially." Let's take an example below. Figure 1 (a) shows 10 data points in one dimension i.e. there is only one feature in the data set.


Representation of big data by dimension reduction

arXiv.org Machine Learning

Suppose the data consist of a set $S$ of points $x_j, 1 \leq j \leq J$, distributed in a bounded domain $D \subset R^N$, where $N$ and $J$ are large numbers. In this paper an algorithm is proposed for checking whether there exists a manifold $\mathbb{M}$ of low dimension near which many of the points of $S$ lie and finding such $\mathbb{M}$ if it exists. There are many dimension reduction algorithms, both linear and non-linear. Our algorithm is simple to implement and has some advantages compared with the known algorithms. If there is a manifold of low dimension near which most of the data points lie, the proposed algorithm will find it. Some numerical results are presented illustrating the algorithm and analyzing its performance compared to the classical PCA (principal component analysis) and Isomap.


Curse of dimensionality - Wikipedia, the free encyclopedia

#artificialintelligence

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic optimization.[1][2] There are multiple phenomena referred to by this name in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining, and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance.


Dimension Reduction and Intuitive Feature Engineering for Machine Learning

#artificialintelligence

In the previous parts of this series, we looked at an overview of some popular tricks for feature engineering, and examined those tricks in greater detail. In this part, we continue our closer examination of these approaches with a deeper dive into the final techniques described in Part 1. The examples discussed in this article can be reproduced with the source code and datasets available here. As an analyst, you savor the scenario in which you have a lot of data. But, with a lot of data comes the added complexity of analyzing and making better sense of that data.


[Report] Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality

Science

Ecology concerns the behavior of complex, dynamic, interconnected systems of populations, communities, and ecosystems over time. Yet ecological time series can be relatively short, owing to practical limits on study duration. Ye and Sugihara introduce an analytical approach called multiview embedding, which harnesses the complexity of short, noisy time series that are common in ecology and other disciplines such as economics. Using examples from published data sets, they show how this approach enhances the tractability of complex data from multiple interacting components and offers a way forward in ecological forecasting.


Dimension Reduction and Intuitive Feature Engineering for Machine Learning

#artificialintelligence

In the previous parts of this series, we looked at an overview of some popular tricks for feature engineering, and examined those tricks in greater detail. In this part, we continue our closer examination of these approaches with a deeper dive into the final techniques described in Part 1. The examples discussed in this article can be reproduced with the source code and datasets available here. As an analyst, you savor the scenario in which you have a lot of data. But, with a lot of data comes the added complexity of analyzing and making better sense of that data.


About the Curse of Dimensionality

@machinelearnbot

In this article, we will discuss the so called'Curse of Dimensionality', and explain why it is important when designing a classifier. In the following sections I will provide an intuitive explanation of this concept, illustrated by a clear example of overfitting due to the curse of dimensionality. Consider an example in which we have a set of images, each of which depicts either a cat or a dog. We would like to create a classifier that is able to distinguish dogs from cats automatically. To do so, we first need to think about a descriptor for each object class that can be expressed by numbers, such that a mathematical algorithm, i.e. a classifier, can use these numbers to recognize the object.


Conceptualizing Curse of Dimensionality with Parallel Coordinates

AAAI Conferences

We report on a novel use of parallel coordinates as a pedagogical tool for illustrating the non-intuitive properties of high dimensional spaces with special emphasis on the phenomenon of Curse of Dimensionality. Also, we have collated what we believe to be a representative sample of diverse approaches that exist in literature to conceptualize the Curse of Dimensionality. We envisage that the paper will have pedagogical value in structuring the way Curse of Dimensionality is presented in classrooms and associated lab sessions.