AITopics | Learning in High Dimensional Spaces

Collaborating Authors

Learning in High Dimensional Spaces

High-dimensional spaces frequently occur in mathematics and the sciences. They may be parameter spaces or configuration spaces such as in Lagrangian or Hamiltonian mechanics; these are abstract spaces, independent of the physical space we live in. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Building Models for Biopathway Dynamics Using Intrinsic Dimensionality Analysis

Wysocka, Emilia M., Dzutsev, Valeriy, Bandyopadhyay, Tirthankar, Condon, Laura, Garg, Sahil

arXiv.org Machine LearningApr-29-2018

An important task for many if not all the scientific domains is efficient knowledge integration, testing and codification. It is often solved with model construction in a controllable computational environment. In spite of that, the throughput of in-silico simulation-based observations become similarly intractable for thorough analysis. This is especially the case in molecular biology, which served as a subject for this study. In this project, we aimed to test some approaches developed to deal with the curse of dimensionality. Among these we found dimension reduction techniques especially appealing. They can be used to identify irrelevant variability and help to understand critical processes underlying high-dimensional datasets. Additionally, we subjected our data sets to nonlinear time series analysis, as those are well established methods for results comparison. To investigate the usefulness of dimension reduction methods, we decided to base our study on a concrete sample set. The example was taken from the domain of systems biology concerning dynamic evolution of sub-cellular signaling. Particularly, the dataset relates to the yeast pheromone pathway and is studied in-silico with a stochastic model. The model reconstructs signal propagation stimulated by a mating pheromone. In the paper, we elaborate on the reason of multidimensional analysis problem in the context of molecular signaling, and next, we introduce the model of choice, simulation details and obtained time series dynamics. A description of used methods followed by a discussion of results and their biological interpretation finalize the paper.

artificial intelligence, machine learning, mutual information, (18 more...)

arXiv.org Machine Learning

1804.11005

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)

Add feedback

Continuum directions for supervised dimension reduction

Jung, Sungkyu

arXiv.org Machine LearningMar-21-2018

Dimension reduction of multivariate data supervised by auxiliary information is considered. A series of basis for dimension reduction is obtained as minimizers of a novel criterion. The proposed method is akin to continuum regression, and the resulting basis is called continuum directions. With a presence of binary supervision data, these directions continuously bridge the principal component, mean difference and linear discriminant directions, thus ranging from unsupervised to fully supervised dimension reduction. High-dimensional asymptotic studies of continuum directions for binary supervision reveal several interesting facts. The conditions under which the sample continuum directions are inconsistent, but their classification performance is good, are specified. While the proposed method can be directly used for binary and multi-category classification, its generalizations to incorporate any form of auxiliary data are also presented. The proposed method enjoys fast computation, and the performance is better or on par with more computer-intensive alternatives. Keywords: continuum regression, dimension reduction, linear discriminant analysis, high-dimension, low-sample-size (HDLSS), maximum data piling, principal component analysis 2000 MSC: 60K35 1. Introduction In modern complex data, it becomes increasingly common that multiple data sets are available. Two types of data are collected on a same set of subjects: a data set of primary interestX and an auxiliary data setY . The goal of supervised dimension reduction is to delineate major signals inX, dependent toY . Relevant application areas include genomics (genetic studies collect both gene expression and SNP data--Li et al. (2016)), finance data (stocks asX in relation to characteristicsY of each stock: size, value, momentum and volatility--Connor et al. (2012)), and batch effect adjustments (Lee et al., 2014). There has been a number of work in dealing with the multi-source data situation. Lock et al. (2013) developed JIVE to separate joint variation from individual variations. Large-scale correlation studies can identify millions of pairwise associations between two data sets via multiple canonical correlation analysis (Witten and Tibshirani, 2009). These methods, however, do not provide supervised dimension reduction of a particular data setX, since all data sets assume an equal role. In contrast, reduced-rank regression (RRR, Izenman, 1975; Tso, 1981) and envelop models (Cook et al., 2010) provide sufficient dimension reduction (Cook and Ni, 2005) for regression problems. See Cook et al. (2013) for connections between envelops and partial least square regression.

artificial intelligence, continuum direction, machine learning, (17 more...)

arXiv.org Machine Learning

1606.05988

Country:

Oceania > New Zealand (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Scalable Algorithms for Learning High-Dimensional Linear Mixed Models

Tan, Zilong, Roche, Kimberly, Zhou, Xiang, Mukherjee, Sayan

arXiv.org Machine LearningMar-12-2018

Linear mixed models (LMMs) are used extensively to model dependecies of observations in linear regression and are used extensively in many application areas. Parameter estimation for LMMs can be computationally prohibitive on big data. State-of-the-art learning algorithms require computational complexity which depends at least linearly on the dimension p of the covariates, and often use heuristics that do not offer theoretical guarantees. We present scalable algorithms for learning high-dimensional LMMs with sublinear computational complexity dependence on p. Key to our approach are novel dual estimators which use only kernel functions of the data, and fast computational techniques based on the subsampled randomized Hadamard transform. We provide theoretical guarantees for our learning algorithms, demonstrating the robustness of parameter estimation. Finally, we complement the theory with experiments on large synthetic and real data.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1803.04431

Country:

North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.70)

Add feedback

Extreme Dimension Reduction for Handling Covariate Shift

Wang, Fulton, Rudin, Cynthia

arXiv.org Machine LearningMar-12-2018

In the covariate shift learning scenario, the training and test covariate distributions differ, so that a predictor's average loss over the training and test distributions also differ. In this work, we explore the potential of extreme dimension reduction, i.e. to very low dimensions, in improving the performance of importance weighting methods for handling covariate shift, which fail in high dimensions due to potentially high train/test covariate divergence and the inability to accurately estimate the requisite density ratios. We first formulate and solve a problem optimizing over linear subspaces a combination of their predictive utility and train/test divergence within. Applying it to simulated and real data, we show extreme dimension reduction helps sometimes but not always, due to a bias introduced by dimension reduction.

artificial intelligence, effective sample size, machine learning, (16 more...)

arXiv.org Machine Learning

1711.10938

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

[R] UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction • r/MachineLearning

@machinelearnbotFeb-13-2018, 13:32:37 GMT

Can I ask you a dumb question? I was thinking about dimensionality reduction the other day and an idea occurred to me: why not just use an autoencoder NN squeezing input data into d dimensions (d 2, 3, ...) and an appropriate loss function to mimic either PCA or t-SNE, or maybe even UMAP would work? This produces a scalable, incremental (approximate) algorithm that easily supports parallelisation. Besides being slower than a pure C/C implementation, do you see something wrong with it?

artificial intelligence, machine learning, social media, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.40)

Add feedback

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes, Leland, Healy, John

arXiv.org Machine LearningFeb-9-2018

Dimension reduction seeks to produce a low dimensional representation of high dimensional data that preserves relevant structure (relevance often being application dependent). Dimension reduction is an important problem in data science for both visualization, and as a potential pre-processing step for machine learning. As a fundamental technique for both visualization and preprocessing, dimension reduction is being applied in a broadening range of fields and on ever increasing sizes of datasets. It is thus desirable to have an algorithm that is both scalable to massive data and able to cope with the diversity of data available. Dimension reduction algorithms tend to fall into two categories; those that seek to preserve the distance structure within the data or those that favor the preservation of local distances over global distance.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Machine Learning

1802.03426

Country:

Pacific Ocean > North Pacific Ocean > East China Sea > Yellow Sea (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Rocky Mountains (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)

Add feedback

Beginners Guide To Learn Dimension Reduction Techniques

@machinelearnbotFeb-8-2018, 19:38:55 GMT

This powerful quote by William Shakespeare applies well to techniques used in data science & analytics as well. Allow me to prove it using a short story. In May ' 2015, we conducted a Data Hackathon ( a data science competition) in Delhi-NCR, India. We gave participants the challenge to identify Human Activity Recognition Using Smartphones Data Set. The data set had 561 variables for training model used for the identification of Human activity in test data set.

artificial intelligence, dimension, machine learning, (11 more...)

@machinelearnbot

Country: Asia > India (0.25)

Genre: Contests & Prizes (0.56)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.46)

Add feedback

Dimension Reduction Using Active Manifolds

Bridges, Robert A., Felder, Chris, Hoff, Chelsey

arXiv.org Machine LearningFeb-7-2018

Scientists and engineers rely on accurate mathematical models to quantify the objects of their studies, which are often high-dimensional. Unfortunately, high-dimensional models are inherently difficult, i.e. when observations are sparse or expensive to determine. One way to address this problem is to approximate the original model with fewer input dimensions. Our project goal was to recover a function f that takes n inputs and returns one output, where n is potentially large. For any given n-tuple, we assume that we can observe a sample of the gradient and output of the function but it is computationally expensive to do so. This project was inspired by an approach known as Active Subspaces, which works by linearly projecting to a linear subspace where the function changes most on average. Our research gives mathematical developments informing a novel algorithm for this problem. Our approach, Active Manifolds, increases accuracy by seeking nonlinear analogues that approximate the function. The benefits of our approach are eliminated unprincipled parameter, choices, guaranteed accessible visualization, and improved estimation accuracy.

artificial intelligence, machine learning, manifold, (12 more...)

arXiv.org Machine Learning

1802.04178

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.43)

Add feedback

Approximation of Functions over Manifolds: A Moving Least-Squares Approach

Sober, Barak, Aizenbud, Yariv, Levin, David

arXiv.org Machine LearningJan-23-2018

We present an algorithm for approximating a function defined over a $d$-dimensional manifold utilizing only noisy function values at locations sampled from the manifold with noise. To produce the approximation we do not require any knowledge regarding the manifold other than its dimension $d$. The approximation scheme is based upon the Manifold Moving Least-Squares (MMLS). The proposed algorithm is resistant to noise in both the domain and function values. Furthermore, the approximant is shown to be smooth and of approximation order of $\mathcal{O}(h^{m+1})$ for non-noisy data, where $h$ is the mesh size with respect to the manifold domain, and $m$ is the degree of a local polynomial approximation utilized in our algorithm. In addition, the proposed algorithm is linear in time with respect to the ambient-space's dimension. Thus, in case of extremely large ambient space dimension, we are able to avoid the curse of dimensionality without having to perform non-linear dimension reduction, which introduces distortions to the manifold data. Using numerical experiments, we compare the presented method to state-of-the-art algorithms for regression over manifolds and show its potential.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1711.00765

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Finland > Central Finland > Jyväskylä (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.54)

Add feedback

Wisdom of the crowd from unsupervised dimension reduction

Wang, Lingfei, Michoel, Tom

arXiv.org Machine LearningNov-28-2017

Wisdom of the crowd, the collective intelligence derived from responses of multiple human or machine individuals to the same questions, can be more accurate than each individual, and improve social decision-making and prediction accuracy. This can also integrate multiple programs or datasets, each as an individual, for the same predictive questions. Crowd wisdom estimates each individual's independent error level arising from their limited knowledge, and finds the crowd consensus that minimizes the overall error. However, previous studies have merely built isolated, problem-specific models with limited generalizability, and mainly for binary (yes/no) responses. Here we show with simulation and real-world data that the crowd wisdom problem is analogous to one-dimensional unsupervised dimension reduction in machine learning. This provides a natural class of crowd wisdom solutions, such as principal component analysis and Isomap, which can handle binary and also continuous responses, like confidence levels, and consequently can be more accurate than existing solutions. They can even outperform supervised-learning-based collective intelligence that is calibrated on historical performance of individuals, e.g. penalized linear regression and random forest. This study unifies crowd wisdom and unsupervised dimension reduction, and thereupon introduces a broad range of highly-performing and widely-applicable crowd wisdom methods. As the costs for data acquisition and processing rapidly decrease, this study will promote and guide crowd wisdom applications in the social and natural sciences, including data fusion, meta-analysis, crowd-sourcing, and committee decision making.

artificial intelligence, crowd wisdom, machine learning, (15 more...)

arXiv.org Machine Learning

1711.11034

Country: Europe > United Kingdom > Scotland > Midlothian (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Therapeutic Area (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.86)

Add feedback