Togo, Ren
Continual Self-supervised Learning Considering Medical Domain Knowledge in Chest CT Images
Tasai, Ren, Li, Guang, Togo, Ren, Tang, Minghui, Yoshimura, Takaaki, Sugimori, Hiroyuki, Hirata, Kenji, Ogawa, Takahiro, Kudo, Kohsuke, Haseyama, Miki
We propose a novel continual self-supervised learning method (CSSL) considering medical domain knowledge in chest CT images. Our approach addresses the challenge of sequential learning by effectively capturing the relationship between previously learned knowledge and new information at different stages. By incorporating an enhanced DER into CSSL and maintaining both diversity and representativeness within the rehearsal buffer of DER, the risk of data interference during pretraining is reduced, enabling the model to learn more richer and robust feature representations. In addition, we incorporate a mixup strategy and feature distillation to further enhance the model's ability to learn meaningful representations. We validate our method using chest CT images obtained under two different imaging conditions, demonstrating superior performance compared to state-of-the-art methods.
Generative Dataset Distillation Based on Self-knowledge Distillation
Li, Longzhen, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Generative dataset distillation aims to condense the information from large-scale datasets into a generative model rather than a static Dataset distillation is an effective technique for reducing the cost dataset [16, 17]. Unlike traditional dataset distillation methods, and complexity of model training while maintaining performance by which produce a smaller fixed dataset, generative dataset distillation compressing large datasets into smaller, more efficient versions. In trains a model capable of generating effective synthetic data on this paper, we present a novel generative dataset distillation method the fly [18]. This approach has been shown to offer better crossarchitecture that can improve the accuracy of aligning prediction logits. Our approach performance compared to traditional methods, while integrates self-knowledge distillation to achieve more precise also providing greater flexibility in the data it generates. The generative distribution matching between the synthetic and original data, dataset distillation process typically consists of two steps.
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering
Zhu, He, Togo, Ren, Ogawa, Takahiro, Haseyama, Miki
Conventional medical artificial intelligence (AI) models face barriers in clinical application and ethical issues owing to their inability to handle the privacy-sensitive characteristics of medical data. We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models, addressing privacy reliability challenges in the medical domain. Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs. Then we introduce a reliable client VQA model that incorporates Dempster-Shafer evidence theory to quantify uncertainty in predictions, enhancing the model's reliability. Furthermore, we propose a novel inter-client communication mechanism that uses maximum likelihood estimation to balance accuracy and uncertainty, fostering efficient integration of insights across clients.
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition
Gan, Yaozong, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can stimulate the ability of MLLM to perceive fine-grained traffic sign categories. By using the description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels. We perform comprehensive evaluations on the German traffic sign recognition benchmark dataset, the Belgium traffic sign dataset, and two real-world datasets taken from Japan. The experimental results show that our method significantly enhances the TSR performance.
Generative Dataset Distillation: Balancing Global Structure and Local Details
Li, Longzhen, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.
Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach
Togo, Taro, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The ability to forget is a crucial brain function that facilitates continual learning by selectively discarding less relevant information for humans. However, in the field of machine learning models, the concept of intentionally forgetting has not been extensively investigated. In this study we aim to bridge this gap by incorporating the forgetting mechanisms into GCIL, thereby examining their impact on the models' ability to learn in continual learning. Through our experiments, we have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge, underscoring the positive role that strategic forgetting plays in the process of continual learning.
Importance-Aware Adaptive Dataset Distillation
Li, Guang, Togo, Ren, Ogawa, Takahiro, Haseyama, Miki
Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.
Dataset Distillation Using Parameter Pruning
Li, Guang, Togo, Ren, Ogawa, Takahiro, Haseyama, Miki
In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method.
RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representation from X-Ray Images
Li, Guang, Togo, Ren, Ogawa, Takahiro, Haseyama, Miki
Purpose: Self-supervised learning has been gaining attention in the medical field for its potential to improve computer-aided diagnosis. One popular method of self-supervised learning is masked image modeling (MIM), which involves masking a subset of input pixels and predicting the masked pixels. However, traditional MIM methods typically use a random masking strategy, which may not be ideal for medical images that often have a small region of interest for disease detection. To address this issue, this work aims to improve MIM for medical images and evaluate its effectiveness in an open X-ray image dataset. Methods: In this paper, we present a novel method called region-guided masked image modeling (RGMIM) for learning meaningful representation from X-ray images. Our method adopts a new masking strategy that utilizes organ mask information to identify valid regions for learning more meaningful representations. The proposed method was contrasted with five self-supervised learning techniques (MAE, SKD, Cross, BYOL, and, SimSiam). We conduct quantitative evaluations on an open lung X-ray image dataset as well as masking ratio hyperparameter studies. Results: When using the entire training set, RGMIM outperformed other comparable methods, achieving a 0.962 lung disease detection accuracy. Specifically, RGMIM significantly improved performance in small data volumes, such as 5% and 10% of the training set (846 and 1,693 images) compared to other methods, and achieved a 0.957 detection accuracy even when only 50% of the training set was used. Conclusions: RGMIM can mask more valid regions, facilitating the learning of discriminative representations and the subsequent high-accuracy lung disease detection. RGMIM outperforms other state-of-the-art self-supervised learning methods in experiments, particularly when limited training data is used.
Gromov-Wasserstein Autoencoders
Nakagawa, Nao, Togo, Ren, Ogawa, Takahiro, Haseyama, Miki
Variational Autoencoder (VAE)-based generative models offer flexible representation learning by incorporating meta-priors, general premises considered beneficial for downstream tasks. However, the incorporated meta-priors often involve ad-hoc model deviations from the original likelihood architecture, causing undesirable changes in their training. In this paper, we propose a novel representation learning method, Gromov-Wasserstein Autoencoders (GWAE), which directly matches the latent and data distributions using the variational autoencoding scheme. Instead of likelihood-based objectives, GWAE models minimize the Gromov-Wasserstein (GW) metric between the trainable prior and given data distributions. The GW metric measures the distance structure-oriented discrepancy between distributions even with different dimensionalities, which provides a direct measure between the latent and data spaces. By restricting the prior family, we can introduce meta-priors into the latent space without changing their objective. The empirical comparisons with VAE-based models show that GWAE models work in two prominent meta-priors, disentanglement and clustering, with their GW objective unchanged.