Goto

Collaborating Authors

 lpca




A Survey of Dimension Estimation Methods

Binnie, James A. D., Dłotko, Paweł, Harvey, John, Malinowski, Jakub, Yim, Ka Man

arXiv.org Artificial Intelligence

It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the data, hence the complexity of the dataset at hand. A great variety of dimension estimators have been developed to find the intrinsic dimension of the data but there is little guidance on how to reliably use these estimators. This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit: tangential estimators which detect a local affine structure; parametric estimators which rely on dimension-dependent probability distributions; and estimators which use topological or metric invariants. The paper evaluates the performance of these methods, as well as investigating varying responses to curvature and noise. Key issues addressed include robustness to hyperparameter selection, sample size requirements, accuracy in high dimensions, precision, and performance on non-linear geometries. In identifying the best hyperparameters for benchmark datasets, overfitting is frequent, indicating that many estimators may not generalise well beyond the datasets on which they have been tested.


Multi-Scale Node Embeddings for Graph Modeling and Generation

Milocco, Riccardo, Jansen, Fabian, Garlaschelli, Diego

arXiv.org Artificial Intelligence

Lying at the interface between Network Science and Machine Learning, node embedding algorithms take a graph as input and encode its structure onto output vectors that represent nodes in an abstract geometric space, enabling various vector-based downstream tasks such as network modelling, data compression, link prediction, and community detection. Two apparently unrelated limitations affect these algorithms. On one hand, it is not clear what the basic operation defining vector spaces, i.e. the vector sum, corresponds to in terms of the original nodes in the network. On the other hand, while the same input network can be represented at multiple levels of resolution by coarse-graining the constituent nodes into arbitrary block-nodes, the relationship between node embeddings obtained at different hierarchical levels is not understood. Here, building on recent results in network renormalization theory, we address these two limitations at once and define a multiscale node embedding method that, upon arbitrary coarse-grainings, ensures statistical consistency of the embedding vector of a block-node with the sum of the embedding vectors of its constituent nodes. We illustrate the power of this approach on two economic networks that can be naturally represented at multiple resolution levels: namely, the international trade between (sets of) countries and the input-output flows among (sets of) industries in the Netherlands. We confirm the statistical consistency between networks retrieved from coarse-grained node vectors and networks retrieved from sums of fine-grained node vectors, a result that cannot be achieved by alternative methods. Several key network properties, including a large number of triangles, are successfully replicated already from embeddings of very low dimensionality, allowing for the generation of faithful replicas of the original networks at arbitrary resolution levels.


Deep reinforcement learning for weakly coupled MDP's with continuous actions

Robledo, Francisco, Ayesta, Urtzi, Avrachenkov, Konstantin

arXiv.org Artificial Intelligence

This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation while maximizing rewards.


Shining light on data: Geometric data analysis through quantum dynamics

Kumar, Akshat, Sarovar, Mohan

arXiv.org Artificial Intelligence

Experimental sciences have come to depend heavily on our ability to organize and interpret high-dimensional datasets. Natural laws, conservation principles, and inter-dependencies among observed variables yield geometric structure, with fewer degrees of freedom, on the dataset. We introduce the frameworks of semiclassical and microlocal analysis to data analysis and develop a novel, yet natural uncertainty principle for extracting fine-scale features of this geometric structure in data, crucially dependent on data-driven approximations to quantum mechanical processes underlying geometric optics. This leads to the first tractable algorithm for approximation of wave dynamics and geodesics on data manifolds with rigorous probabilistic convergence rates under the manifold hypothesis. We demonstrate our algorithm on real-world datasets, including an analysis of population mobility information during the COVID-19 pandemic to achieve four-fold improvement in dimensionality reduction over existing state-of-the-art and reveal anomalous behavior exhibited by less than 1.2% of the entire dataset. Our work initiates the study of data-driven quantum dynamics for analyzing datasets, and we outline several future directions for research.


Low-Rank Representations Towards Classification Problem of Complex Networks

Çelik, Murat, Taşdemir, Ali Baran, Özkahya, Lale

arXiv.org Artificial Intelligence

Complex networks representing social interactions, brain activities, molecular structures have been studied widely to be able to understand and predict their characteristics as graphs. Models and algorithms for these networks are used in real-life applications, such as search engines, and recommender systems. In general, such networks are modelled by constructing a low-dimensional Euclidean embedding of the vertices of the network, where proximity of the vertices in the Euclidean space hints the likelihood of an edge (link). In this work, we study the performance of such low-rank representations of real-life networks on a network classification problem.


Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

Amsaleg, Laurent, Chelly, Oussama, Houle, Michael E., Kawarabayashi, Ken-ichi, Radovanović, Miloš, Treeratanajaru, Weeris

arXiv.org Artificial Intelligence

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper, we propose a local ID estimation strategy stable even for `tight' localities consisting of as few as 20 sample points. The estimator applies MLE techniques over all available pairwise distances among the members of the sample, based on a recent extreme-value-theoretic model of intrinsic dimensionality, the Local Intrinsic Dimension (LID). Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.


Manifold learning via quantum dynamics

Kumar, Akshat, Sarovar, Mohan

arXiv.org Machine Learning

We introduce an algorithm for computing geodesics on sampled manifolds that relies on simulation of quantum dynamics on a graph embedding of the sampled data. Our approach exploits classic results in semiclassical analysis and the quantum-classical correspondence, and forms a basis for techniques to learn the manifold from which a dataset is sampled, and subsequently for nonlinear dimensionality reduction of high-dimensional datasets. We illustrate the new algorithm with data sampled from model manifolds and also by a clustering demonstration based on COVID-19 mobility data. Finally, our method reveals interesting connections between the discretization provided by data sampling and quantization.


Dimensionality Reduction for Binary Data through the Projection of Natural Parameters

Landgraf, Andrew J., Lee, Yoonkyung

arXiv.org Machine Learning

Principal component analysis (PCA) for binary data, known as logistic PCA, has become a popular alternative to dimensionality reduction of binary data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the Bernoulli log-likelihood. We propose a new formulation of logistic PCA which extends Pearson's formulation of a low dimensional data representation with minimum error to binary data. Our formulation does not require a matrix factorization, as previous methods do, but instead looks for projections of the natural parameters from the saturated model. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. We derive explicit solutions for data matrices of special structure and provide computationally efficient algorithms for solving for the principal component loadings. Through simulation experiments and an analysis of medical diagnoses data, we compare our formulation of logistic PCA to the previous formulation as well as ordinary PCA to demonstrate its benefits.