Kojima, Shinya
Distributionally Robust Active Learning for Gaussian Process Regression
Takeno, Shion, Okura, Yoshito, Inatsu, Yu, Tatsuya, Aoyama, Tanaka, Tomonari, Satoshi, Akahane, Hanada, Hiroyuki, Hashimoto, Noriaki, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro
Gaussian process regression (GPR) or kernel ridge regression is a widely used and powerful tool for nonlinear prediction. Therefore, active learning (AL) for GPR, which actively collects data labels to achieve an accurate prediction with fewer data labels, is an important problem. However, existing AL methods do not theoretically guarantee prediction accuracy for target distribution. Furthermore, as discussed in the distributionally robust learning literature, specifying the target distribution is often difficult. Thus, this paper proposes two AL methods that effectively reduce the worst-case expected error for GPR, which is the worst-case expectation in target distribution candidates. We show an upper bound of the worst-case expected squared error, which suggests that the error will be arbitrarily small by a finite number of data labels under mild conditions. Finally, we demonstrate the effectiveness of the proposed methods through synthetic and real-world datasets.
Distributionally Robust Coreset Selection under Covariate Shift
Tanaka, Tomonari, Hanada, Hiroyuki, Yang, Hanting, Aoyama, Tatsuya, Inatsu, Yu, Akahane, Satoshi, Okura, Yoshito, Hashimoto, Noriaki, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro
Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that the data distributions differ between the development phase and the deployment phase, with the latter being unknown. Thus, it is challenging to select an effective subset of training data that performs well across all deployment scenarios. We therefore propose Distributionally Robust Coreset Selection (DRCS). DRCS theoretically derives an estimate of the upper bound for the worst-case test error, assuming that the future covariate distribution may deviate within a defined range from the training distribution. Furthermore, by selecting instances in a way that suppresses the estimate of the upper bound for the worst-case test error, DRCS achieves distributionally robust training instance selection. This study is primarily applicable to convex training computation, but we demonstrate that it can also be applied to deep learning under appropriate approximations. In this paper, we focus on covariate shift, a type of data distribution shift, and demonstrate the effectiveness of DRCS through experiments.
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation
Aoyama, Tatsuya, Yang, Hanting, Hanada, Hiroyuki, Akahane, Satoshi, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro
Reducing the amount of training data while preserving model performance remains a fundamental challenge in machine learning. Dataset distillation seeks to generate synthetic instances that encapsulate the essential information of the original data [31]. This synthetic approach often proves more flexible and can potentially achieve greater data reduction than simply retaining subsets of actual instances. Such distilled datasets can also serve broader applications, for example by enabling efficient continual learning with reduced storage demands [14, 23, 3], and offering privacy safeguards through data corruption [2, 12]. Existing dataset distillation methods are essentially formulated as a bi-level optimization problem. This is because generating synthetic instances requires retraining the model with those instances as training data. Specifically, synthetic instances are created in the outer loop, and the model is trained in the inner loop, leading to high computational costs. A promising approach to avoid bi-level optimization is a method called Kernel Inducing Point (KIP) [18]. The KIP method avoids bi-level optimization by obtaining an analytical solution in its inner loop, effectively leveraging the fact that its loss function is a variant of squared loss.
Distributionally Robust Safe Sample Screening
Hanada, Hiroyuki, Tatsuya, Aoyama, Satoshi, Akahane, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Takeno, Shion, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro
In this study, we propose a machine learning method called Distributionally Robust Safe Sample Screening (DRSSS). DRSSS aims to identify unnecessary training samples, even when the distribution of the training samples changes in the future. To achieve this, we effectively combine the distributionally robust (DR) paradigm, which aims to enhance model robustness against variations in data distribution, with the safe sample screening (SSS), which identifies unnecessary training samples prior to model training. Since we need to consider an infinite number of scenarios regarding changes in the distribution, we applied SSS because it does not require model training after the change of the distribution. In this paper, we employed the covariate shift framework to represent the distribution of training samples and reformulated the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the existing SSS technique to accommodate this weight uncertainty, the DRSSS method is capable of reliably identifying unnecessary samples under any future distribution within a specified range. We provide a theoretical guarantee for the DRSSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.
Distributionally Robust Safe Screening
Hanada, Hiroyuki, Akahane, Satoshi, Aoyama, Tatsuya, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Murayama, Taro, Hanju, Lee, Kojima, Shinya, Takeuchi, Ichiro
In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.