Dimensionality Reduction
Doubly Non-Central Beta Matrix Factorization for Stable Dimensionality Reduction of Bounded Support Matrix Data
Albert, Anjali N., Flaherty, Patrick, Schein, Aaron
We consider the problem of developing interpretable and computationally efficient matrix decomposition methods for matrices whose entries have bounded support. Such matrices are found in large-scale DNA methylation studies and many other settings. Our approach decomposes the data matrix into a Tucker representation wherein the number of columns in the constituent factor matrices is not constrained. We derive a computationally efficient sampling algorithm to solve for the Tucker decomposition. We evaluate the performance of our method using three criteria: predictability, computability, and stability. Empirical results show that our method has similar performance as other state-of-the-art approaches in terms of held-out prediction and computational complexity, but has significantly better performance in terms of stability to changes in hyper-parameters. The improved stability results in higher confidence in the results in applications where the constituent factors are used to generate and test scientific hypotheses such as DNA methylation analysis of cancer samples.
Simultaneous Dimensionality Reduction for Extracting Useful Representations of Large Empirical Multimodal Datasets
The quest for simplification in physics drives the exploration of concise mathematical representations for complex systems. This Dissertation focuses on the concept of dimensionality reduction as a means to obtain low-dimensional descriptions from high-dimensional data, facilitating comprehension and analysis. We address the challenges posed by real-world data that defy conventional assumptions, such as complex interactions within neural systems or high-dimensional dynamical systems. Leveraging insights from both theoretical physics and machine learning, this work unifies diverse reduction methods under a comprehensive framework, the Deep Variational Multivariate Information Bottleneck. This framework enables the design of tailored reduction algorithms based on specific research questions. We explore and assert the efficacy of simultaneous reduction approaches over their independent reduction counterparts, demonstrating their superiority in capturing covariation between multiple modalities, while requiring less data. We also introduced novel techniques, such as the Deep Variational Symmetric Information Bottleneck, for general nonlinear simultaneous reduction. We show that the same principle of simultaneous reduction is the key to efficient estimation of mutual information. We show that our new method is able to discover the coordinates of high-dimensional observations of dynamical systems. Through analytical investigations and empirical validations, we shed light on the intricacies of dimensionality reduction methods, paving the way for enhanced data analysis across various domains. We underscore the potential of these methodologies to extract meaningful insights from complex datasets, driving advancements in fundamental research and applied sciences. As these methods evolve, they promise to deepen our understanding of complex systems and inform more effective data analysis strategies.
GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction
Tepper, Mariano, Bhati, Ishwar Singh, Aguerrebere, Cecilia, Willke, Ted
Embedding models can generate high-dimensional vectors whose similarity reflects semantic affinities. Thus, accurately and timely retrieving those vectors in a large collection that are similar to a given query has become a critical component of a wide range of applications. In particular, cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. Here, it is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. Moreover, the high vector dimensionality puts these search systems under compute and memory pressure, leading to subpar performance. In this work, we present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries. The linear LeanVec-Sphering outperforms other linear methods, trains faster, comes with no hyperparameters, and allows to set the target dimensionality more flexibly. The nonlinear Generalized LeanVec (GleanVec) uses a piecewise linear scheme to further improve the search accuracy while remaining computationally nimble. Initial experimental results show that LeanVec-Sphering and GleanVec push the state of the art for vector search.
A Normative Theory of Adaptive Dimensionality Reduction in Neural Networks
To make sense of the world our brains must analyze high-dimensional datasets streamed by our sensory organs. Because such analysis begins with dimensionality reduction, modelling early sensory processing requires biologically plausible online dimensionality reduction algorithms. Recently, we derived such an algorithm, termed similarity matching, from a Multidimensional Scaling (MDS) objective function. However, in the existing algorithm, the number of output dimensions is set a priori by the number of output neurons and cannot be changed. Because the number of informative dimensions in sensory inputs is variable there is a need for adaptive dimensionality reduction.
Dimensionality Reduction for Wasserstein Barycenter
The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension d of the underlying space of the distributions. In order to cope with this curse of dimensionality,'' we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size n, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension O(\log n) independent of both d and k, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors.
Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction
Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distortions that worsen class separation. The supervised mapping method introduced in the present paper, called ClassNeRV, proposes an original stress function that takes class annotation into account and evaluates embedding quality both in terms of false neighbors and missed neighbors. ClassNeRV shares the theoretical framework of a family of methods descended from Stochastic Neighbor Embedding (SNE). Our approach has a key advantage over previous ones: in the literature supervised methods often emphasize class separation at the price of distorting the data neighbors' structure; conversely, unsupervised methods provide better preservation of structure at the price of often mixing classes.
Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels
We revisit the classic randomized sketch of a tensor product of q vectors x_i\in\mathbb{R} n . However, in their analysis C_{\Omega} 2 can be as large as \Theta(n {2q}), even for a set \Omega of O(1) vectors x . We give a new analysis of this sketch, providing nearly optimal bounds. For the important case of q 2 and \delta 1/\poly(n), this shows that m \Theta(\epsilon {-2} \log(n) \epsilon {-1} \log 2(n)), demonstrating that the \epsilon {-2} and \log 2(n) terms do not multiply each other. In a number of applications, one has \Omega \poly(n) and in this case our bounds are optimal up to a constant factor.
Dimensionality reduction: theoretical perspective on practical measures
Dimensionality reduction plays a central role in real-world applications for Machine Learning, among many fields. In particular, metric dimensionality reduction where data from a general metric is mapped into low dimensional space, is often used as a first step before applying machine learning algorithms. In almost all these applications the quality of the embedding is measured by various average case criteria. Metric dimensionality reduction has also been studied in Math and TCS, within the extremely fruitful and influential field of metric embedding. Yet, the vast majority of theoretical research has been devoted to analyzing the worst case behavior of embeddings and therefore has little relevance to practical settings.
Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
We consider a natural model of online preference aggregation, where sets of preferred items R1, R2, ..., Rt, ..., along with a demand for kt items in each Rt, appear online. Without prior knowledge of (Rt, kt), the learner maintains a ranking \pit aiming that at least kt items from Rt appear high in \pi_t. This is a fundamental problem in preference aggregation with applications to e.g., ordering product or news items in web pages based on user scrolling and click patterns. The widely studied Generalized Min-Sum-Set-Cover (GMSSC) problem serves as a formal model for the setting above. GMSSC is NP-hard and the standard application of no-regret online learning algorithms is computationally inefficient, because they operate in the space of rankings.
Solving Interpretable Kernel Dimensionality Reduction
Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition.