AITopics

2510.17548

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Barberi, Leonardo Aldo Alejandro, De Cave, Linda Maria

Topological Data Analysis for Unsupervised Anomaly Detection and Customer Segmentation on Banking Data

arXiv.org Artificial IntelligenceAug-21-2025

This paper introduces advanced techniques of Topological Data Analysis (TDA) for unsupervised anomaly detection and customer segmentation in banking data. Using the Mapper algorithm and persistent homology, we develop unsupervised procedures that uncover meaningful patterns in customers' banking data by exploiting topological information. The framework we present in this paper yields actionable insights that combine the abstract mathematical subject of topology with real-life use cases that are useful in industry.

data mining, machine learning, transfer 0, (14 more...)

2508.14136

Country: Europe (1.00)

Genre: Research Report (1.00)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.58)

Yan, Xinyuan, Sevastjanova, Rita, van der Ben, Sinie, El-Assady, Mennatallah, Wang, Bei

Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

arXiv.org Artificial IntelligenceJul-25-2025

Large language models (LLMs) produce high-dimensional embeddings that capture rich semantic and syntactic relationships between words, sentences, and concepts. Investigating the topological structures of LLM embedding spaces via mapper graphs enables us to understand their underlying structures. Specifically, a mapper graph summarizes the topological structure of the embedding space, where each node represents a topological neighborhood (containing a cluster of embeddings), and an edge connects two nodes if their corresponding neighborhoods overlap. However, manually exploring these embedding spaces to uncover encoded linguistic properties requires considerable human effort. To address this challenge, we introduce a framework for semi-automatic annotation of these embedding properties. To organize the exploration process, we first define a taxonomy of explorable elements within a mapper graph such as nodes, edges, paths, components, and trajectories. The annotation of these elements is executed through two types of customizable LLM-based agents that employ perturbation techniques for scalable and automated analysis. These agents help to explore and explain the characteristics of mapper elements and verify the robustness of the generated explanations. We instantiate the framework within a visual analytics workspace and demonstrate its effectiveness through case studies. In particular, we replicate findings from prior research on BERT's embedding properties across various layers of its architecture and provide further observations into the linguistic properties of topological neighborhoods.

explanation, large language model, machine learning, (19 more...)

2507.18607

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-16-2024

A Mapper Algorithm with implicit intervals and its optimization

Tao, Yuyang, Ge, Shufei

The Mapper algorithm is an essential tool for visualizing complex, high dimensional data in topology data analysis (TDA) and has been widely used in biomedical research. It outputs a combinatorial graph whose structure implies the shape of the data. However,the need for manual parameter tuning and fixed intervals, along with fixed overlapping ratios may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent. In this work, we develop a soft Mapper framework based on a Gaussian mixture model(GMM) for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a stochastic gradient descent algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB) successfully identifies a distinct subgroup of Alzheimer's Disease.

algorithm, artificial intelligence, machine learning, (17 more...)

2412.11631

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Oregon (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.88)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Ruscitti, Kaleb D., McInnes, Leland

Improving Mapper's Robustness by Varying Resolution According to Lens-Space Density

arXiv.org Machine LearningOct-4-2024

We propose an improvement to the Mapper algorithm that removes the assumption of a single resolution scale across semantic space, and improves the robustness of the results under change of parameters. This eases parameter selection, especially for datasets with highly variable local density in the Morse function $f$ used for Mapper. This is achieved by incorporating this density into the choice of cover for Mapper. Furthermore, we prove that for covers with some natural hypotheses, the graph output by Mapper still converges in bottleneck distance to the Reeb graph of the Rips complex of the data, but captures more topological features than when using the usual Mapper cover. Finally, we discuss implementation details, and include the results of computational experiments. We also provide an accompanying reference implementation.

density-based mapper, graph, mapper, (14 more...)

2410.03862

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Denmark > Central Jutland > Aarhus (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Bungula, Wako, Darcy, Isabel

Bi-Filtration and Stability of TDA Mapper for Point Cloud Data

arXiv.org Machine LearningSep-25-2024

Carlsson, Singh and Memoli's TDA mapper takes a point cloud dataset and outputs a graph that depends on several parameter choices. Dey, Memoli, and Wang developed Multiscale Mapper for abstract topological spaces so that parameter choices can be analyzed via persistent homology. However, when applied to actual data, one does not always obtain filtrations of mapper graphs. DBSCAN, one of the most common clustering algorithms used in the TDA mapper software, has two parameters, \textbf{$\epsilon$} and \textbf{MinPts}. If \textbf{MinPts = 1} then DBSCAN is equivalent to single linkage clustering with cutting height \textbf{$\epsilon$}. We show that if DBSCAN clustering is used with \textbf{MinPts $>$ 2}, a filtration of mapper graphs may not exist except in the absence of free-border points; but such filtrations exist if DBSCAN clustering is used with \textbf{MinPts = 1} or \textbf{2} as the cover size increases, \textbf{$\epsilon$} increases, and/or \textbf{MinPts} decreases. However, the 1-dimensional filtration is unstable. If one adds noise to a data set so that each data point has been perturbed by a distance at most \textbf{$\delta$}, the persistent homology of the mapper graph of the perturbed data set can be significantly different from that of the original data set. We show that we can obtain stability by increasing both the cover size and \textbf{$\epsilon$} at the same time. In particular, we show that the bi-filtrations of the homology groups with respect to cover size and $\epsilon$ between these two datasets are \textbf{2$\delta$}-interleaved.

cluster cover, filtration, minp ts, (13 more...)

2409.1736

Country:

North America > United States > Wisconsin > La Crosse County > La Crosse (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Oulhaj, Ziyad, Carrière, Mathieu, Michel, Bertrand

Differentiable Mapper For Topological Optimization Of Data Representation

arXiv.org Artificial IntelligenceFeb-20-2024

Unsupervised data representation and visualization using tools from topology is an active and growing field of Topological Data Analysis (TDA) and data science. Its most prominent line of work is based on the so-called Mapper graph, which is a combinatorial graph whose topological structures (connected components, branches, loops) are in correspondence with those of the data itself. While highly generic and applicable, its use has been hampered so far by the manual tuning of its many parameters-among these, a crucial one is the so-called filter: it is a continuous function whose variations on the data set are the main ingredient for both building the Mapper representation and assessing the presence and sizes of its topological structures. However, while a few parameter tuning methods have already been investigated for the other Mapper parameters (i.e., resolution, gain, clustering), there is currently no method for tuning the filter itself. In this work, we build on a recently proposed optimization framework incorporating topology to provide the first filter optimization scheme for Mapper graphs. In order to achieve this, we propose a relaxed and more general version of the Mapper graph, whose convergence properties are investigated. Finally, we demonstrate the usefulness of our approach by optimizing Mapper graph representations on several datasets, and showcasing the superiority of the optimized representation over arbitrary ones.

cover assignment scheme, filter function, mapper graph, (12 more...)

2402.12854

Country:

Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.05)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

arXiv.org Artificial IntelligenceJan-19-2024

A distribution-guided Mapper algorithm

Tao, Yuyang, Ge, Shufei

Motivation: The Mapper algorithm is an essential tool to explore shape of data in topology data analysis. With a dataset as an input, the Mapper algorithm outputs a graph representing the topological features of the whole dataset. This graph is often regarded as an approximation of a reeb graph of data. The classic Mapper algorithm uses fixed interval lengths and overlapping ratios, which might fail to reveal subtle features of data, especially when the underlying structure is complex. Results: In this work, we introduce a distribution guided Mapper algorithm named D-Mapper, that utilizes the property of the probability model and data intrinsic characteristics to generate density guided covers and provides enhanced topological features. Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones. Moreover, we introduce a metric accounting for both the quality of overlap clustering and extended persistence homology to measure the performance of Mapper type algorithm. Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios. We also apply the D-Mapper to a SARS-COV-2 coronavirus RNA sequences dataset to explore the topological structure of different virus variants. The results indicate that the D-Mapper algorithm can reveal both vertical and horizontal evolution processes of the viruses. Availability: Our package is available at https://github.com/ShufeiGe/D-Mapper.

algorithm, d-mapper, mapper algorithm, (15 more...)

2401.12237

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.68)

arXiv.org Machine LearningSep-12-2023

$G$-Mapper: Learning a Cover in the Mapper Construction

Alvarado, Enrique, Belton, Robin, Fischer, Emily, Lee, Kang-Ju, Palande, Sourabh, Percival, Sarah, Purvine, Emilie

The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. The Mapper algorithm requires tuning several parameters in order to generate a "nice" Mapper graph. The paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by conducting iteratively the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model in order to choose carefully the cover based on the distribution of a given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets.

artificial intelligence, data mining, machine learning, (17 more...)

2309.06634

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.95)

arXiv.org Artificial IntelligenceNov-30-2022

Experimental Observations of the Topology of Convolutional Neural Network Activations

Purvine, Emilie, Brown, Davis, Jefferson, Brett, Joslyn, Cliff, Praggastis, Brenda, Rathore, Archit, Shapiro, Madelyn, Wang, Bei, Zhou, Youjia

Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden-layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers, and we discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight into how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models.

artificial intelligence, deep learning, machine learning, (18 more...)

2212.00222

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Utah (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)