Umeda, Yuhei
Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives
Ramasubramanian, Shrinivas, Rangwani, Harsh, Takemori, Sho, Samanta, Kunal, Umeda, Yuhei, Radhakrishnan, Venkatesh Babu
The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks. The rise of deep networks has shown great promise by reaching near-perfect performance across computer vision tasks (He et al., 2022; Kolesnikov et al., 2020; Kirillov et al., 2023; Girdhar et al., 2023). It has led to their widespread deployment for practical applications, some of which have critical consequences (Castelvecchi, 2020). Hence, these deployed models must perform robustly across the entire data distribution and not just the majority part. These failure cases are often overlooked when considering only accuracy as our primary performance metric. Therefore, more practical metrics like Recall H-Mean (Sun et al., 2006), Worst-Case (Min) Recall (Narasimhan & Menon, 2021; Mohri et al., 2019), etc., should be used for evaluation.
Toward Unlimited Self-Learning MCMC with Parallel Adaptive Annealing
Ichikawa, Yuma, Nakagawa, Akira, Masayuki, Hiromoto, Umeda, Yuhei
Its efficiency strongly depends on the choice of the proposal. Self-learning Monte Carlo (SLMC) methods are recently proposed to accelerate Markov chain Among recent advances in machine learning, a general Monte Carlo (MCMC) methods using a machine method called the self-learning Monte Carlo (SLMC) learning model. With latent generative models, method (Liu et al., 2017) was introduced to accelerate SLMC methods realize efficient Monte Carlo updates MCMC simulations by an automated proposal with a machine with less autocorrelation. However, SLMC learning model and has been applied to various problems methods are difficult to directly apply to multimodal (Xu et al., 2017; Shen et al., 2018). In particular, distributions for which training data are a latent generative model realizes efficient global update difficult to obtain. To solve the limitation, we propose through the obtained information-rich latent representation "parallel adaptive annealing," which makes (Huang & Wang, 2017; Albergo et al., 2019; Monroe & SLMC methods directly apply to multimodal distributions Shen, 2022; Tanaka & Tomiya, 2017). Although powerful, with a gradually trained proposal while the performance of SLMC simulation strongly depends annealing target distribution. Parallel adaptive annealing on an automated proposal with machine learning models is based on (i) sequential learning with and the quality of training data to train the proposal. For annealing to inherit and update the model parameters, example, it is challenging to directly use SLMC for multimodal (ii) adaptive annealing to automatically detect distributions because obtaining accurate training data under-learning, and (iii) parallel annealing to covering all modes is difficult.
Fast and Multi-aspect Mining of Complex Time-stamped Event Streams
Nakamura, Kota, Matsubara, Yasuko, Kawabata, Koki, Umeda, Yuhei, Wada, Yuichiro, Sakurai, Yasushi
Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and ''components'', for which we present CubeScope, an efficient and effective method over high-order tensor streams. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, ''regimes'' (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden ''components'' representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice. Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.
Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics
Rangwani, Harsh, Ramasubramanian, Shrinivas, Takemori, Sho, Takashi, Kato, Umeda, Yuhei, Radhakrishnan, Venkatesh Babu
Self-training based semi-supervised learning algorithms have enabled the learning of highly accurate deep neural networks, using only a fraction of labeled data. However, the majority of work on self-training has focused on the objective of improving accuracy, whereas practical machine learning systems can have complex goals (e.g. maximizing the minimum of recall across classes, etc.) that are non-decomposable in nature. In this work, we introduce the Cost-Sensitive Self-Training (CSST) framework which generalizes the self-training-based methods for optimizing non-decomposable metrics. We prove that our framework can better optimize the desired non-decomposable metric utilizing unlabeled data, under similar data distribution assumptions made for the analysis of self-training. Using the proposed CSST framework, we obtain practical self-training methods (for both vision and NLP tasks) for optimizing different non-decomposable metrics using deep neural networks. Our results demonstrate that CSST achieves an improvement over the state-of-the-art in majority of the cases across datasets and objectives.
Topological Uncertainty: Monitoring trained neural networks through persistence of activation graphs
Lacombe, Théo, Ike, Yuichi, Carriere, Mathieu, Chazal, Frédéric, Glisse, Marc, Umeda, Yuhei
Although neural networks are capable of reaching astonishing performances on a wide variety of contexts, properly training networks on complicated tasks requires expertise and can be expensive from a computational perspective. In industrial applications, data coming from an open-world setting might widely differ from the benchmark datasets on which a network was trained. Being able to monitor the presence of such variations without retraining the network is of crucial importance. In this article, we develop a method to monitor trained neural networks based on the topological properties of their activation graphs. To each new observation, we assign a Topological Uncertainty, a score that aims to assess the reliability of the predictions by investigating the whole network instead of its final layer only, as typically done by practitioners. Our approach entirely works at a post-training level and does not require any assumption on the network architecture, optimization scheme, nor the use of data augmentation or auxiliary datasets; and can be faithfully applied on a large range of network architectures and data types. We showcase experimentally the potential of Topological Uncertainty in the context of trained network selection, Out-Of-Distribution detection, and shift-detection, both on synthetic and real datasets of images and graphs.
ATOL: Automatic Topologically-Oriented Learning
Royer, Martin, Chazal, Frédéric, Ike, Yuichi, Umeda, Yuhei
There are abundant cases for using Topological Data Analysis (TDA) in a learning context, but robust topological information commonly comes in the form of a set of persistence diagrams, objects that by nature are uneasy to affix to a generic machine learning framework. We introduce a vectorisation method for diagrams that allows to collect information from topological descriptors into a format fit for machine learning tools. Based on a few observations, the method is learned and tailored to discriminate the various important plane regions a diagram is set into. With this tool one can automatically augment any sort of machine learning problem with access to a TDA method, enhance performances, construct features reflecting underlying changes in topological behaviour. The proposed methodology comes with only high level tuning parameters such as the encoding budget for topological features. We provide an open-access, ready-to-use implementation and notebook. We showcase the strengths and versatility of our approach on a number of applications. From emulous and modern graph collections to a highly topological synthetic dynamical orbits data, we prove that the method matches or beats the state-of-the-art in encoding persistence diagrams to solve hard problems. We then apply our method in the context of an industrial, difficult time-series regression problem and show the approach to be relevant.
Topological Data Analysis for Arrhythmia Detection through Modular Neural Networks
Dindin, Meryll, Umeda, Yuhei, Chazal, Frederic
This paper presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals.We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a deep-learning model that turns out to be effective for new unseen patient. The novelty of our approach relies on the use of topological data analysis as basis of our multichannel architecture, to diminish the bias due to individual differences. We show that our structure reaches the performances of the state-of-the-art methods regarding arrhythmia detection and classification.
A General Neural Network Architecture for Persistence Diagrams and Graph Classification
Carrière, Mathieu, Chazal, Frédéric, Ike, Yuichi, Lacombe, Théo, Royer, Martin, Umeda, Yuhei
Graph classification is a difficult problem that has drawn a lot of attention from the machine learning community over the past few years. This is mainly due to the fact that, contrarily to Euclidean vectors, the inherent complexity of graph structures can be quite hard to encode and handle for traditional classifiers. Even though kernels have been proposed in the literature, the increase in the dataset sizes has greatly limited the use of kernel methods since computation and storage of kernel matrices has become impracticable. In this article, we propose to use extended persistence diagrams to efficiently encode graph structure. More precisely, we show that using the so-called heat kernel signatures for the computation of these extended persistence diagrams allows one to quickly and efficiently summarize the graph structure. Then, we build on the recent development of neural networks for point clouds to define an architecture for (extended) persistence diagrams which is modular and easy-to-use. Finally, we demonstrate the usefulness of our approach by validating our architecture on several graph datasets, on which the obtained results are comparable to the state-of-the-art for graph classification.