Trofimov, Ilya
RTD-Lite: Scalable Topological Analysis for Comparing Weighted Graphs in Learning Tasks
Tulchinskii, Eduard, Voronkova, Daria, Trofimov, Ilya, Burnaev, Evgeny, Barannikov, Serguei
Topological methods for comparing weighted graphs are valuable in various learning tasks but often suffer from computational inefficiency on large datasets. We introduce RTD-Lite, a scalable algorithm that efficiently compares topological features, specifically connectivity or cluster structures at arbitrary scales, of two weighted graphs with one-to-one correspondence between vertices. Using minimal spanning trees in auxiliary graphs, RTD-Lite captures topological discrepancies with $O(n^2)$ time and memory complexity. This efficiency enables its application in tasks like dimensionality reduction and neural network training. Experiments on synthetic and real-world datasets demonstrate that RTD-Lite effectively identifies topological differences while significantly reducing computation time compared to existing methods. Moreover, integrating RTD-Lite into neural network training as a loss function component enhances the preservation of topological structures in learned representations. Our code is publicly available at https://github.com/ArGintum/RTD-Lite
Scalar Function Topology Divergence: Comparing Topology of 3D Objects
Trofimov, Ilya, Voronkova, Daria, Tulchinskii, Eduard, Burnaev, Evgeny, Barannikov, Serguei
We propose a new topological tool for computer vision - Scalar Function Topology Divergence (SFTD), which measures the dissimilarity of multi-scale topology between sublevel sets of two functions having a common domain. Functions can be defined on an undirected graph or Euclidean space of any dimensionality. Most of the existing methods for comparing topology are based on Wasserstein distance between persistence barcodes and they don't take into account the localization of topological features. On the other hand, the minimization of SFTD ensures that the corresponding topological features of scalar functions are located in the same places. The proposed tool provides useful visualizations depicting areas where functions have topological dissimilarities. We provide applications of the proposed method to 3D computer vision. In particular, experiments demonstrate that SFTD improves the reconstruction of cellular 3D shapes from 2D fluorescence microscopy images, and helps to identify topological errors in 3D segmentation.
SeqNAS: Neural Architecture Search for Event Sequence Classification
Udovichenko, Igor, Shvetsov, Egor, Divitsky, Denis, Osin, Dmitry, Trofimov, Ilya, Glushenko, Anatoly, Sukharev, Ivan, Berestenev, Dmitry, Burnaev, Evgeny
Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.
Disentanglement Learning via Topology
Balabin, Nikita, Voronkova, Daria, Trofimov, Ilya, Burnaev, Evgeny, Barannikov, Serguei
We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art method based on VAE minimizes the total correlation of the joint distribution of latent variables. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement. Our experiments have shown that the proposed topological loss improves disentanglement scores such as MIG, FactorVAE score, SAP score and DCI disentanglement score with respect to state-of-the-art results. Our method works in an unsupervised manner, permitting to apply it for problems without labeled factors of variation. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.
Learning Topology-Preserving Data Representations
Trofimov, Ilya, Cherniavskii, Daniil, Tulchinskii, Eduard, Balabin, Nikita, Burnaev, Evgeny, Barannikov, Serguei
We propose a method for learning topology-preserving data representations (dimensionality reduction). The method aims to provide topological similarity between the data manifold and its latent representation via enforcing the similarity in topological features (clusters, loops, 2D voids, etc.) and their localization. The core of the method is the minimization of the Representation Topology Divergence (RTD) between original high-dimensional data and low-dimensional representation in latent space. RTD minimization provides closeness in topological features with strong theoretical guarantees. We develop a scheme for RTD differentiation and apply it as a loss term for the autoencoder. The proposed method "RTD-AE" better preserves the global structure and topology of the data manifold than state-of-the-art competitors as measured by linear correlation, triplet distance ranking accuracy, and Wasserstein distance between persistence barcodes.
Representation Topology Divergence: A Method for Comparing Neural Network Representations
Barannikov, Serguei, Trofimov, Ilya, Balabin, Nikita, Burnaev, Evgeny
Comparison of data representations is a complex multi-aspect problem that has not enjoyed a complete solution yet. We propose a method for comparing two data representations. We introduce the Representation Topology Divergence (RTD), measuring the dissimilarity in multi-scale topology between two point clouds of equal size with a one-to-one correspondence between points. The data point clouds are allowed to lie in different ambient spaces. The RTD is one of the few TDA-based practical methods applicable to real machine learning datasets. Experiments show that the proposed RTD agrees with the intuitive assessment of data representation similarity and is sensitive to its topological structure. We apply RTD to gain insights on neural networks representations in computer vision and NLP domains for various problems: training dynamics analysis, data distribution shift, transfer learning, ensemble learning, disentanglement assessment.
Manifold Topology Divergence: a Framework for Comparing Data Manifolds
Barannikov, Serguei, Trofimov, Ilya, Sotnikov, Grigorii, Trimbach, Ekaterina, Korotin, Alexander, Filippov, Alexander, Burnaev, Evgeny
We develop a framework for comparing data manifolds, aimed, in particular, towards the evaluation of deep generative models. We describe a novel tool, Cross-Barcode(P,Q), that, given a pair of distributions in a high-dimensional space, tracks multiscale topology spacial discrepancies between manifolds on which the distributions are concentrated. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) and apply it to assess the performance of deep generative models in various domains: images, 3D-shapes, time-series, and on different datasets: MNIST, Fashion MNIST, SVHN, CIFAR10, FFHQ, chest X-ray images, market stock data, ShapeNet. We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance. Our algorithm scales well (essentially linearly) with the increase of the dimension of the ambient high-dimensional space. It is one of the first TDA-based practical methodologies that can be applied universally to datasets of different sizes and dimensions, including the ones on which the most recent GANs in the visual domain are trained. The proposed method is domain agnostic and does not rely on pre-trained networks.
Topological obstructions in neural networks learning
Barannikov, Serguei, Sotnikov, Grigorii, Trofimov, Ilya, Korotin, Alexander, Burnaev, Evgeny
We apply methods of topological data analysis to loss functions to gain insights on learning of deep neural networks and their generalization properties We study global properties of the loss function's gradient flow. We use topological data analysis of the loss function and its Morse complex to relate local behaviour along gradient trajectories with global properties of the loss surface. We define neural network's Topological Obstructions' score («TOscore») with help of robust topological invariants (barcodes of loss function) that quantify the "badness" of local minima for gradient-based optimization. We have made several experiments for computing these invariants, for small neural networks, and for fully connected, convolutional and ResNetlike neural networks on different datasets: MNIST, Fashion MNIST, CIFAR10, SVHN. Our two principal observations are 1) the neural network's barcode and TOscore decrease with the increase of the neural network's depth and width 2) there is an intriguing connection between the length of minima's segments in the barcode and the minima's generalization error. Introduction Mathematically, if one opens the "black box" of deep learning, there are two immediate mysteries.
Multi-fidelity Neural Architecture Search with Knowledge Distillation
Trofimov, Ilya, Klyuchnikov, Nikita, Salnikov, Mikhail, Filippov, Alexander, Burnaev, Evgeny
Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on a part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we propose to improve low-fidelity evaluations of neural architectures by using a knowledge distillation. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We carry out experiments on CIFAR-100 and ImageNet and study various knowledge distillation methods. We show that training on the small part of a dataset with such a modified loss function leads to a better selection of neural architectures than training with a logistic loss. The proposed low-fidelity evaluations were incorporated into a multi-fidelity search algorithm that outperformed the search based on high-fidelity evaluations only (training on a full dataset).
NAS-Bench-NLP: Neural Architecture Search Benchmark for Natural Language Processing
Klyuchnikov, Nikita, Trofimov, Ilya, Artemova, Ekaterina, Salnikov, Mikhail, Fedorov, Maxim, Burnaev, Evgeny
Neural Architecture Search (NAS) is a promising and rapidly evolving research area. Training a large number of neural networks requires an exceptional amount of computational power, which makes NAS unreachable for those researchers who have limited or no access to high-performance clusters and supercomputers. A few benchmarks with precomputed neural architectures performances have been recently introduced to overcome this problem and ensure more reproducible experiments. However, these benchmarks are only for the computer vision domain and, thus, are built from the image datasets and convolution-derived architectures. In this work, we step outside the computer vision domain by leveraging the language modeling task, which is the core of natural language processing (NLP). Our main contribution is as follows: we have provided search space of recurrent neural networks on the text datasets and trained 14k architectures within it; we have conducted both intrinsic and extrinsic evaluation of the trained models using datasets for semantic relatedness and language understanding evaluation; finally, we have tested several NAS algorithms to demonstrate how the precomputed results can be utilized. We believe that our results have high potential of usage for both NAS and NLP communities.