Maeda, Keisuke
Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation
Uesugi, Kenta, Saito, Naoki, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
The collection and access large-scale visual data. Construction of the CIR of these triplets is typically costly and traditionally relies on manual model utilizes triplets that consist of a reference image, modification annotation [12,13], which makes it difficult to gather the large-scale text describing desired changes, and a target image that reflects datasets necessary for practical CIR model training. To deal with these changes. For effectively training CIR models, extensive manual this issue, Ventura et al. have proposed an automatic method to annotation to construct high-quality training datasets, which can select image pairs for triplets from captions previously assigned to be time-consuming and labor-intensive, is required. To deal with the large-scale image dataset [14]. However, this automatic triplet this problem, this paper proposes a novel triplet synthesis method by collection method has several critical issues. This method focuses leveraging counterfactual image generation. By controlling visual solely on collecting similar images based on their captions, which feature modifications via counterfactual image generation, our approach may obtain low-quality triplets. That is, the pairs of images for automatically generates diverse training triplets without any triplets differ significantly in aspects not described by the modification manual intervention.
Generative Dataset Distillation Based on Self-knowledge Distillation
Li, Longzhen, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Generative dataset distillation aims to condense the information from large-scale datasets into a generative model rather than a static Dataset distillation is an effective technique for reducing the cost dataset [16, 17]. Unlike traditional dataset distillation methods, and complexity of model training while maintaining performance by which produce a smaller fixed dataset, generative dataset distillation compressing large datasets into smaller, more efficient versions. In trains a model capable of generating effective synthetic data on this paper, we present a novel generative dataset distillation method the fly [18]. This approach has been shown to offer better crossarchitecture that can improve the accuracy of aligning prediction logits. Our approach performance compared to traditional methods, while integrates self-knowledge distillation to achieve more precise also providing greater flexibility in the data it generates. The generative distribution matching between the synthetic and original data, dataset distillation process typically consists of two steps.
Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation
Watanabe, Koshi, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Dimensionality reduction (DR) offers a useful representation of complex high-dimensional data. Recent DR methods focus on hyperbolic geometry to derive a faithful low-dimensional representation of hierarchical data. However, existing methods are based on neighbor embedding, frequently ruining the continual relation of the hierarchies. This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation. We adopt generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning. This paper presents three variants that employ original point, sparse point, and Bayesian estimations. We establish their learning algorithms by incorporating the Riemannian optimization and active approximation scheme of GP-LVM. For Bayesian inference, we further introduce the reparameterization trick to realize Bayesian latent variable learning. In the last part of this paper, we apply hGP-LVMs to several datasets and show their ability to represent high-dimensional hierarchies in low-dimensional spaces.
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition
Gan, Yaozong, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can stimulate the ability of MLLM to perceive fine-grained traffic sign categories. By using the description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels. We perform comprehensive evaluations on the German traffic sign recognition benchmark dataset, the Belgium traffic sign dataset, and two real-world datasets taken from Japan. The experimental results show that our method significantly enhances the TSR performance.
Generative Dataset Distillation: Balancing Global Structure and Local Details
Li, Longzhen, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.
Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach
Togo, Taro, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The ability to forget is a crucial brain function that facilitates continual learning by selectively discarding less relevant information for humans. However, in the field of machine learning models, the concept of intentionally forgetting has not been extensively investigated. In this study we aim to bridge this gap by incorporating the forgetting mechanisms into GCIL, thereby examining their impact on the models' ability to learn in continual learning. Through our experiments, we have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge, underscoring the positive role that strategic forgetting plays in the process of continual learning.
Few-shot Personalized Saliency Prediction Based on Inter-personnel Gaze Patterns
Moroto, Yuya, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
This paper presents few-shot personalized saliency prediction based on inter-personnel gaze patterns. In contrast to a general saliency map, a personalized saliecny map (PSM) has been great potential since its map indicates the person-specific visual attention that is useful for obtaining individual visual preferences from heterogeneity of gazed areas. The PSM prediction is needed for acquiring the PSM for the unseen image, but its prediction is still a challenging task due to the complexity of individual gaze patterns. For modeling individual gaze patterns for various images, although the eye-tracking data obtained from each person is necessary to construct PSMs, it is difficult to acquire the massive amounts of such data. Here, one solution for efficient PSM prediction from the limited amount of data can be the effective use of eye-tracking data obtained from other persons. In this paper, to effectively treat the PSMs of other persons, we focus on the effective selection of images to acquire eye-tracking data and the preservation of structural information of PSMs of other persons. In the experimental results, we confirm that the above two focuses are effective for the PSM prediction with the limited amount of eye-tracking data.