Coskunuzer, Baris
TopOC: Topological Deep Learning for Ovarian and Breast Cancer Diagnosis
Fatema, Saba, Nuwagira, Brighton, Chakraborty, Sayoni, Gedik, Reyhan, Coskunuzer, Baris
Microscopic examination of slides prepared from tissue samples is the primary tool for detecting and classifying cancerous lesions, a process that is time-consuming and requires the expertise of experienced pathologists. Recent advances in deep learning methods hold significant potential to enhance medical diagnostics and treatment planning by improving accuracy, reproducibility, and speed, thereby reducing clinicians' workloads and turnaround times. However, the necessity for vast amounts of labeled data to train these models remains a major obstacle to the development of effective clinical decision support systems. In this paper, we propose the integration of topological deep learning methods to enhance the accuracy and robustness of existing histopathological image analysis models. Topological data analysis (TDA) offers a unique approach by extracting essential information through the evaluation of topological patterns across different color channels. While deep learning methods capture local information from images, TDA features provide complementary global features. Our experiments on publicly available histopathological datasets demonstrate that the inclusion of topological features significantly improves the differentiation of tumor types in ovarian and breast cancers.
PROXI: Challenging the GNNs for Link Prediction
Tola, Astrit, Myrick, Jack, Coskunuzer, Baris
Over the past decade, Graph Neural Networks (GNNs) have transformed graph representation learning. In the widely adopted message-passing GNN framework, nodes refine their representations by aggregating information from neighboring nodes iteratively. While GNNs excel in various domains, recent theoretical studies have raised concerns about their capabilities. GNNs aim to address various graph-related tasks by utilizing such node representations, however, this one-size-fits-all approach proves suboptimal for diverse tasks. Motivated by these observations, we conduct empirical tests to compare the performance of current GNN models with more conventional and direct methods in link prediction tasks. Introducing our model, PROXI, which leverages proximity information of node pairs in both graph and attribute spaces, we find that standard machine learning (ML) models perform competitively, even outperforming cutting-edge GNN models when applied to these proximity metrics derived from node neighborhoods and attributes. This holds true across both homophilic and heterophilic networks, as well as small and large benchmark datasets, including those from the Open Graph Benchmark (OGB). Moreover, we show that augmenting traditional GNNs with PROXI significantly boosts their link prediction performance. Our empirical findings corroborate the previously mentioned theoretical observations and imply that there exists ample room for enhancement in current GNN models to reach their potential.
TopER: Topological Embeddings in Graph Representation Learning
Tola, Astrit, Taiwo, Funmilola Mary, Akcora, Cuneyt Gurcan, Coskunuzer, Baris
Graph embeddings play a critical role in graph representation learning, allowing machine learning models to explore and interpret graph-structured data. However, existing methods often rely on opaque, high-dimensional embeddings, limiting interpretability and practical visualization. In this work, we introduce Topological Evolution Rate (TopER), a novel, low-dimensional embedding approach grounded in topological data analysis. TopER simplifies a key topological approach, Persistent Homology, by calculating the evolution rate of graph substructures, resulting in intuitive and interpretable visualizations of graph data. This approach not only enhances the exploration of graph datasets but also delivers competitive performance in graph clustering and classification tasks. Our TopER-based models achieve or surpass state-of-the-art results across molecular, biological, and social network datasets in tasks such as classification, clustering, and visualization.
ClassContrast: Bridging the Spatial and Contextual Gaps for Node Representations
Uddin, Md Joshem, Tola, Astrit, Sikand, Varin, Akcora, Cuneyt Gurcan, Coskunuzer, Baris
Graph Neural Networks (GNNs) have revolutionized the domain of graph representation learning by utilizing neighborhood aggregation schemes in many popular architectures, such as message passing graph neural networks (MPGNNs). This scheme involves iteratively calculating a node's representation vector by aggregating and transforming the representation vectors of its adjacent nodes. Despite their effectiveness, MPGNNs face significant issues, such as oversquashing, oversmoothing, and underreaching, which hamper their effectiveness. Additionally, the reliance of MPGNNs on the homophily assumption, where edges typically connect nodes with similar labels and features, limits their performance in heterophilic contexts, where connected nodes often have significant differences. This necessitates the development of models that can operate effectively in both homophilic and heterophilic settings. In this paper, we propose a novel approach, ClassContrast, grounded in Energy Landscape Theory from Chemical Physics, to overcome these limitations. ClassContrast combines spatial and contextual information, leveraging a physics-inspired energy landscape to model node embeddings that are both discriminative and robust across homophilic and heterophilic settings. Our approach introduces contrast-based homophily matrices to enhance the understanding of class interactions and tendencies. Through extensive experiments, we demonstrate that ClassContrast outperforms traditional GNNs in node classification and link prediction tasks, proving its effectiveness and versatility in diverse real-world scenarios.
EMP: Effective Multidimensional Persistence for Graph Representation Learning
Segovia-Dominguez, Ignacio, Chen, Yuzhou, Akcora, Cuneyt G., Zhen, Zhiwei, Kantarcioglu, Murat, Gel, Yulia R., Coskunuzer, Baris
Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data's multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP's utility in graph classification tasks, showing its effectiveness. Results reveal that EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets.
Time-Aware Knowledge Representations of Dynamic Objects with Multidimensional Persistence
Coskunuzer, Baris, Segovia-Dominguez, Ignacio, Chen, Yuzhou, Gel, Yulia R.
Learning time-evolving objects such as multivariate time series and dynamic networks requires the development of novel knowledge representation mechanisms and neural network architectures, which allow for capturing implicit time-dependent information contained in the data. Such information is typically not directly observed but plays a key role in the learning task performance. In turn, lack of time dimension in knowledge encoding mechanisms for time-dependent data leads to frequent model updates, poor learning performance, and, as a result, subpar decision-making. Here we propose a new approach to a time-aware knowledge representation mechanism that notably focuses on implicit time-dependent topological information along multiple geometric dimensions. In particular, we propose a new approach, named \textit{Temporal MultiPersistence} (TMP), which produces multidimensional topological fingerprints of the data by using the existing single parameter topological summaries. The main idea behind TMP is to merge the two newest directions in topological representation learning, that is, multi-persistence which simultaneously describes data shape evolution along multiple key parameters, and zigzag persistence to enable us to extract the most salient data shape information over time. We derive theoretical guarantees of TMP vectorizations and show its utility, in application to forecasting on benchmark traffic flow, Ethereum blockchain, and electrocardiogram datasets, demonstrating the competitive performance, especially, in scenarios of limited data records. In addition, our TMP method improves the computational efficiency of the state-of-the-art multipersistence summaries up to 59.5 times.
Chainlet Orbits: Topological Address Embedding for the Bitcoin Blockchain
Azad, Poupak, Coskunuzer, Baris, Kantarcioglu, Murat, Akcora, Cuneyt Gurcan
The rise of cryptocurrencies like Bitcoin, which enable transactions with a degree of pseudonymity, has led to a surge in various illicit activities, including ransomware payments and transactions on darknet markets. These illegal activities often utilize Bitcoin as the preferred payment method. However, current tools for detecting illicit behavior either rely on a few heuristics and laborious data collection processes or employ computationally inefficient graph neural network (GNN) models that are challenging to interpret. To overcome the computational and interpretability limitations of existing techniques, we introduce an effective solution called Chainlet Orbits. This approach embeds Bitcoin addresses by leveraging their topological characteristics in transactions. By employing our innovative address embedding, we investigate e-crime in Bitcoin networks by focusing on distinctive substructures that arise from illicit behavior. The results of our node classification experiments demonstrate superior performance compared to state-of-the-art methods, including both topological and GNN-based approaches. Moreover, our approach enables the use of interpretable and explainable machine learning models in as little as 15 minutes for most days on the Bitcoin transaction network.
Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT
Akcora, Cuneyt Gurcan, Kantarcioglu, Murat, Gel, Yulia R., Coskunuzer, Baris
Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunit
Topological Relational Learning on Graphs
Chen, Yuzhou, Coskunuzer, Baris, Gel, Yulia R.
Graph neural networks (GNNs) have emerged as a powerful tool for graph classification and representation learning. However, GNNs tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GNNs and for systematically learning a local graph structure. The key idea is to rewire the original graph by using the persistent homology of the small neighborhoods of nodes and then to incorporate the extracted topological summaries as the side information into the local algorithm. As a result, the new framework enables us to harness both the conventional information on the graph structure and information on the graph higher order topological properties. We derive theoretical stability guarantees for the new local topological representation and discuss their implications on the graph algebraic connectivity. The experimental results on node classification tasks demonstrate that the new TRI-GNN outperforms all 14 state-of-the-art baselines on 6 out 7 graphs and exhibit higher robustness to perturbations, yielding up to 10\% better performance under noisy scenarios.