Goto

Collaborating Authors

 Rieck, Bastian


Principal Curvatures Estimation with Applications to Single Cell Data

arXiv.org Artificial Intelligence

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datasets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.


No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets

arXiv.org Machine Learning

Benchmark datasets have proved pivotal to the success of graph learning, and good benchmark datasets are crucial to guide the development of the field. Recent research has highlighted problems with graph-learning datasets and benchmarking practices -- revealing, for example, that methods which ignore the graph structure can outperform graph-based approaches on popular benchmark datasets. Such findings raise two questions: (1) What makes a good graph-learning dataset, and (2) how can we evaluate dataset quality in graph learning? Our work addresses these questions. As the classic evaluation setup uses datasets to evaluate models, it does not apply to dataset evaluation. Hence, we start from first principles. Observing that graph-learning datasets uniquely combine two modes -- the graph structure and the node features -- , we introduce RINGS, a flexible and extensible mode-perturbation framework to assess the quality of graph-learning datasets based on dataset ablations -- i.e., by quantifying differences between the original dataset and its perturbed representations. Within this framework, we propose two measures -- performance separability and mode complementarity -- as evaluation tools, each assessing, from a distinct angle, the capacity of a graph dataset to benchmark the power and efficacy of graph-learning methods. We demonstrate the utility of our framework for graph-learning dataset evaluation in an extensive set of experiments and derive actionable recommendations for improving the evaluation of graph-learning methods. Our work opens new research directions in data-centric graph learning, and it constitutes a first step toward the systematic evaluation of evaluations.


Graph Classification Gaussian Processes via Hodgelet Spectral Features

arXiv.org Machine Learning

The problem of classifying graphs is ubiquitous in machine learning. While it is standard to apply graph neural networks or graph kernel methods, Gaussian processes can be employed by transforming spatial features from the graph domain into spectral features in the Euclidean domain, and using them as the input points of classical kernels. However, this approach currently only takes into account features on vertices, whereas some graph datasets also support features on edges. In this work, we present a Gaussian process-based classification algorithm that can leverage one or both vertex and edges features. Furthermore, we take advantage of the Hodge decomposition to better capture the intricate richness of vertex and edge features, which can be beneficial on diverse tasks.


Topology meets Machine Learning: An Introduction using the Euler Characteristic Transform

arXiv.org Artificial Intelligence

Machine learning is shaping up to be the transformative technology of our times: Many of us have played with (and marveled at) models like ChatGPT, new breakthroughs in applications like healthcare research are announced on an almost daily basis, and new avenues for integrating these tools into scientific research are opening up, with some mathematicians already using large language models as proof assistants. This article aims to lift the veil and dispel some myths about machine learning; along the way, it will also show how machine learning itself can benefit from mathematical concepts. Indeed, from the outside, machine learning might look like a homogeneous entity, but in fact, the field is fractured and highly diverse. While the main thrust of the field arises from the undeniable engineering advances, with bigger and better models, there is also a strong community of applied mathematicians. Next to the classical drivers of machine-learning architectures, i.e., linear algebra and statistics, topology recently started to provide novel insights into the foundations of machine learning: Point-set topology, harnessing concepts like neighborhoods, can be used to extend existing algorithms from graphs to cell complexes [4]. Algebraic topology, making use of effective invariants like homology, improves the results of models for volume reconstruction [13]. Finally, differential topology, providing tools to study smooth properties of data, results in efficient methods for analyzing embedded (simplicial) complexes [6]. These (and many more) methods have now found a home in the nascent field of topological deep learning [8]. Before diving into concrete examples, let us first take a step back and discuss machine learning as such.


Detecting and Approximating Redundant Computational Blocks in Neural Networks

arXiv.org Artificial Intelligence

Deep neural networks often learn similar internal representations, both across different models and within their own layers. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities present new opportunities for designing more efficient architectures. In this paper, we investigate the emergence of these internal similarities across different layers in diverse neural architectures, showing that similarity patterns emerge independently of the datataset used. We introduce a simple metric, Block Redundancy, to detect redundant blocks, providing a foundation for future architectural optimization methods. Building on this, we propose Redundant Blocks Approximation (RBA), a general framework that identifies and approximates one or more redundant computational blocks using simpler transformations. We show that the transformation $\mathcal{T}$ between two representations can be efficiently computed in closed-form, and it is enough to replace the redundant blocks from the network. RBA reduces model parameters and time complexity while maintaining good performance. We validate our method on classification tasks in the vision domain using a variety of pretrained foundational models and datasets.


Generative Topology for Shape Synthesis

arXiv.org Artificial Intelligence

The Euler Characteristic Transform (ECT) is a powerful invariant for assessing geometrical and topological characteristics of a large variety of objects, including graphs and embedded simplicial complexes. Although the ECT is invertible in theory, no explicit algorithm for general data sets exists. In this paper, we address this lack and demonstrate that it is possible to learn the inversion, permitting us to develop a novel framework for shape generation tasks on point clouds. Our model exhibits high quality in reconstruction and generation tasks, affords efficient latent-space interpolation, and is orders of magnitude faster than existing methods. Understanding shapes requires understanding their geometrical and topological properties in tandem. Given the large variety of different representations of such data, ranging from point clouds over graphs to simplicial complexes, a general framework for handling such inputs is beneficial. The Euler Characteristic Transform (ECT) provides such a framework based on the idea of studying a shape from multiple directions--sampled from a sphere of appropriate dimensionality--and at multiple scales. In fact, the ECT is an injective map, serving as a unique characterisation of a shape (Ghrist et al., 2018; Turner et al., 2014). Somewhat surprisingly, this even holds when using a finite number of directions (Curry et al., 2022). Hence, while it is known that the ECT can be inverted, i.e. it is possible to reconstruct input data from an ECT, only algorithms for special cases such as planar graphs are currently known (Fasy et al., 2018).


MANTRA: The Manifold Triangulations Assemblage

arXiv.org Artificial Intelligence

The rising interest in leveraging higher-order interactions present in complex systems has led to a surge in more expressive models exploiting high-order structures in the data, especially in topological deep learning (TDL), which designs neural networks on highorder domains such as simplicial complexes. However, progress in this field is hindered by the scarcity of datasets for benchmarking these architectures. To address this gap, we introduce MANTRA, the first large-scale, diverse, and intrinsically high-order dataset for benchmarking high-order models, comprising over 43,000 and 249,000 triangulations of surfaces and three-dimensional manifolds, respectively. With MANTRA, we assess several graph-and simplicial complex-based models on three topological classification tasks. We demonstrate that while simplicial complex-based neural networks generally outperform their graph-based counterparts in capturing simple topological invariants, they also struggle, suggesting a rethink of TDL. Thus, MANTRA serves as a benchmark for assessing and advancing topological methods, leading the way for more effective high-order models. Success in machine learning is commonly measured by a model's ability to solve tasks on benchmark datasets. While researchers typically devote a large amount of time to build their models, less time is devoted to data and its curation. As a consequence, graph learning is facing some issues in terms of reproducibility and wrong assumptions, which serve as obstructions to progress. An example of this was recently observed while analyzing long-range features: additional hyperparameter tuning resolves performance differences between message-passing (MP) graph neural networks on one side and graph transformers on the other (Tönshoff et al., 2023). In a similar vein, earlier work pointed out the relevance of strong baselines, highlighting the fact that structural information is not exploited equally by all models (Errica et al., 2020). Recently, new analyses even showed that for some benchmark datasets, as well as their associated tasks, graph information may be detrimental for the overall predictive performance (Bechler-Speicher et al., 2024).


Diss-l-ECT: Dissecting Graph Data with local Euler Characteristic Transforms

arXiv.org Artificial Intelligence

The Euler Characteristic Transform (ECT) is an efficiently-computable geometrical-topological invariant that characterizes the global shape of data. In this paper, we introduce the Local Euler Characteristic Transform ($\ell$-ECT), a novel extension of the ECT particularly designed to enhance expressivity and interpretability in graph representation learning. Unlike traditional Graph Neural Networks (GNNs), which may lose critical local details through aggregation, the $\ell$-ECT provides a lossless representation of local neighborhoods. This approach addresses key limitations in GNNs by preserving nuanced local structures while maintaining global interpretability. Moreover, we construct a rotation-invariant metric based on $\ell$-ECTs for spatial alignment of data spaces. Our method exhibits superior performance than standard GNNs on a variety of node classification tasks, particularly in graphs with high heterophily.


The Manifold Density Function: An Intrinsic Method for the Validation of Manifold Learning

arXiv.org Artificial Intelligence

We introduce the manifold density function, which is an intrinsic method to validate manifold learning techniques. Our approach adapts and extends Ripley's $K$-function, and categorizes in an unsupervised setting the extent to which an output of a manifold learning algorithm captures the structure of a latent manifold. Our manifold density function generalizes to broad classes of Riemannian manifolds. In particular, we extend the manifold density function to general two-manifolds using the Gauss-Bonnet theorem, and demonstrate that the manifold density function for hypersurfaces is well approximated using the first Laplacian eigenvalue. We prove desirable convergence and robustness properties.


Position Paper: Challenges and Opportunities in Topological Deep Learning

arXiv.org Machine Learning

Traditional machine learning often assumes that the observed data of interest are supported on a linear vector space Topological deep learning (TDL) is a rapidly and can be described by a set of feature vectors. However, evolving field that uses topological features to understand there is growing awareness that, in many cases, this viewpoint and design deep learning models. This is insufficient to describe several data within the real paper posits that TDL may complement graph representation world. For example, molecules may be described more appropriately learning and geometric deep learning by graphs than feature vectors. Other examples by incorporating topological concepts, and can include three-dimensional objects represented by meshes, thus provide a natural choice for various machine as encountered in computer graphics and geometry processing, learning settings. To this end, this paper discusses or data supported on top of a complex social network open problems in TDL, ranging from practical of interrelated actors. Hence, there has been an increased benefits to theoretical foundations. For each problem, interest in importing concepts from geometry and topology it outlines potential solutions and future research into the usual machine learning pipelines to gain further opportunities.