AITopics | Dimensionality Reduction

Collaborating Authors

Dimensionality Reduction

Dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection (find a subset of the original variables) and feature extraction (transform the data in the high-dimensional space to a space of fewer dimensions). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Doubly Non-Central Beta Matrix Factorization for Stable Dimensionality Reduction of Bounded Support Matrix Data

Albert, Anjali N., Flaherty, Patrick, Schein, Aaron

arXiv.org Artificial IntelligenceOct-24-2024

We consider the problem of developing interpretable and computationally efficient matrix decomposition methods for matrices whose entries have bounded support. Such matrices are found in large-scale DNA methylation studies and many other settings. Our approach decomposes the data matrix into a Tucker representation wherein the number of columns in the constituent factor matrices is not constrained. We derive a computationally efficient sampling algorithm to solve for the Tucker decomposition. We evaluate the performance of our method using three criteria: predictability, computability, and stability. Empirical results show that our method has similar performance as other state-of-the-art approaches in terms of held-out prediction and computational complexity, but has significantly better performance in terms of stability to changes in hyper-parameters. The improved stability results in higher confidence in the results in applications where the constituent factors are used to generate and test scientific hypotheses such as DNA methylation analysis of cancer samples.

artificial intelligence, decomposition, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.18425

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.42)

Add feedback

Simultaneous Dimensionality Reduction for Extracting Useful Representations of Large Empirical Multimodal Datasets

Abdelaleem, Eslam

arXiv.org Artificial IntelligenceOct-23-2024

The quest for simplification in physics drives the exploration of concise mathematical representations for complex systems. This Dissertation focuses on the concept of dimensionality reduction as a means to obtain low-dimensional descriptions from high-dimensional data, facilitating comprehension and analysis. We address the challenges posed by real-world data that defy conventional assumptions, such as complex interactions within neural systems or high-dimensional dynamical systems. Leveraging insights from both theoretical physics and machine learning, this work unifies diverse reduction methods under a comprehensive framework, the Deep Variational Multivariate Information Bottleneck. This framework enables the design of tailored reduction algorithms based on specific research questions. We explore and assert the efficacy of simultaneous reduction approaches over their independent reduction counterparts, demonstrating their superiority in capturing covariation between multiple modalities, while requiring less data. We also introduced novel techniques, such as the Deep Variational Symmetric Information Bottleneck, for general nonlinear simultaneous reduction. We show that the same principle of simultaneous reduction is the key to efficient estimation of mutual information. We show that our new method is able to discover the coordinates of high-dimensional observations of dynamical systems. Through analytical investigations and empirical validations, we shed light on the intricacies of dimensionality reduction methods, paving the way for enhanced data analysis across various domains. We underscore the potential of these methodologies to extract meaningful insights from complex datasets, driving advancements in fundamental research and applied sciences. As these methods evolve, they promise to deepen our understanding of complex systems and inform more effective data analysis strategies.

data mining, machine learning, simultaneous dimensionality reduction, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.13714.88004

2410.19867

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Hawaii (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)

Add feedback

GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction

Tepper, Mariano, Bhati, Ishwar Singh, Aguerrebere, Cecilia, Willke, Ted

arXiv.org Artificial IntelligenceOct-14-2024

Embedding models can generate high-dimensional vectors whose similarity reflects semantic affinities. Thus, accurately and timely retrieving those vectors in a large collection that are similar to a given query has become a critical component of a wide range of applications. In particular, cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. Here, it is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. Moreover, the high vector dimensionality puts these search systems under compute and memory pressure, leading to subpar performance. In this work, we present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries. The linear LeanVec-Sphering outperforms other linear methods, trains faster, comes with no hyperparameters, and allows to set the target dimensionality more flexibly. The nonlinear Generalized LeanVec (GleanVec) uses a piecewise linear scheme to further improve the search accuracy while remaining computationally nimble. Initial experimental results show that LeanVec-Sphering and GleanVec push the state of the art for vector search.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.22347

Genre:

Research Report (0.70)
Overview (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

A Normative Theory of Adaptive Dimensionality Reduction in Neural Networks

Neural Information Processing SystemsOct-11-2024, 15:46:17 GMT

To make sense of the world our brains must analyze high-dimensional datasets streamed by our sensory organs. Because such analysis begins with dimensionality reduction, modelling early sensory processing requires biologically plausible online dimensionality reduction algorithms. Recently, we derived such an algorithm, termed similarity matching, from a Multidimensional Scaling (MDS) objective function. However, in the existing algorithm, the number of output dimensions is set a priori by the number of output neurons and cannot be changed. Because the number of informative dimensions in sensory inputs is variable there is a need for adaptive dimensionality reduction.

adaptive dimensionality reduction, algorithm, covariance matrix, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Add feedback

Dimensionality Reduction for Wasserstein Barycenter

Neural Information Processing SystemsOct-11-2024, 13:33:49 GMT

The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension d of the underlying space of the distributions. In order to cope with this curse of dimensionality,'' we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size n, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension O(\log n) independent of both d and k, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors.

dimensionality reduction, wasserstein barycenter, wasserstein barycenter problem

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.94)

Add feedback

Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction

Neural Information Processing SystemsOct-10-2024, 21:11:43 GMT

Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distortions that worsen class separation. The supervised mapping method introduced in the present paper, called ClassNeRV, proposes an original stress function that takes class annotation into account and evaluates embedding quality both in terms of false neighbors and missed neighbors. ClassNeRV shares the theoretical framework of a family of methods descended from Stochastic Neighbor Embedding (SNE). Our approach has a key advantage over previous ones: in the literature supervised methods often emphasize class separation at the price of distorting the data neighbors' structure; conversely, unsupervised methods provide better preservation of structure at the price of often mixing classes.

preserve class and neighbor, steering distortion, supervised dimensionality reduction, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.65)

Add feedback

Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

Neural Information Processing SystemsOct-10-2024, 19:55:28 GMT

We revisit the classic randomized sketch of a tensor product of q vectors x_i\in\mathbb{R} n . However, in their analysis C_{\Omega} 2 can be as large as \Theta(n {2q}), even for a set \Omega of O(1) vectors x . We give a new analysis of this sketch, providing nearly optimal bounds. For the important case of q 2 and \delta 1/\poly(n), this shows that m \Theta(\epsilon {-2} \log(n) \epsilon {-1} \log 2(n)), demonstrating that the \epsilon {-2} and \log 2(n) terms do not multiply each other. In a number of applications, one has \Omega \poly(n) and in this case our bounds are optimal up to a constant factor.

omega, sketching low degree polynomial kernel, tight dimensionality reduction, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.40)

Add feedback

Dimensionality reduction: theoretical perspective on practical measures

Neural Information Processing SystemsOct-10-2024, 12:59:28 GMT

Dimensionality reduction plays a central role in real-world applications for Machine Learning, among many fields. In particular, metric dimensionality reduction where data from a general metric is mapped into low dimensional space, is often used as a first step before applying machine learning algorithms. In almost all these applications the quality of the embedding is measured by various average case criteria. Metric dimensionality reduction has also been studied in Math and TCS, within the extremely fruitful and influential field of metric embedding. Yet, the vast majority of theoretical research has been devoted to analyzing the worst case behavior of embeddings and therefore has little relevance to practical settings.

criteria, dimensionality reduction, reduction, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent

Neural Information Processing SystemsOct-10-2024, 06:43:20 GMT

We consider a natural model of online preference aggregation, where sets of preferred items R1, R2, ..., Rt, ..., along with a demand for kt items in each Rt, appear online. Without prior knowledge of (Rt, kt), the learner maintains a ranking \pit aiming that at least kt items from Rt appear high in \pi_t. This is a fundamental problem in preference aggregation with applications to e.g., ordering product or news items in web pages based on user scrolling and click patterns. The widely studied Generalized Min-Sum-Set-Cover (GMSSC) problem serves as a formal model for the setting above. GMSSC is NP-hard and the standard application of no-regret online learning algorithms is computationally inefficient, because they operate in the space of rankings.

dimensionality reduction, efficient online learning, gradient descent, (5 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.65)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.45)

Add feedback

Solving Interpretable Kernel Dimensionality Reduction

Neural Information Processing SystemsOct-10-2024, 04:16:21 GMT

Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition.

interpretable kernel dimensionality reduction, ism, objective, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback