target data
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- Europe (0.14)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Optimal Transport-Guided Conditional Score-Based Diffusion Model Xiang Gu1, Liwei Y ang
Conditional score-based diffusion model (SBDM) is for conditional generation of target data with paired data as condition, and has achieved great success in image translation. However, it requires the paired data as condition, and there would be insufficient paired data provided in real-world applications.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (2 more...)
Transfer learning for scalar-on-function regression via control variates
Transfer learning (TL) has emerged as a powerful tool for improving estimation and prediction performance by leveraging information from related datasets. In this paper, we repurpose the control-variates (CVS) method for TL in the context of scalar-on-function regression. Our proposed framework relies exclusively on dataset-specific summary statistics, avoiding the need to pool subject-level data and thus remaining applicable in privacy-restricted or decentralized settings. We establish theoretical connections among several existing TL strategies and derive convergence rates for our CVS-based proposals. These rates explicitly account for the typically overlooked smoothing error and reveal how the similarity among covariance functions across datasets influences convergence behavior. Numerical studies support the theoretical findings and demonstrate that the proposed methods achieve competitive estimation and prediction performance compared with existing alternatives.
- North America > United States > New York (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- Health & Medicine (1.00)
- Banking & Finance > Trading (0.67)
Mixed Samples as Probes for Unsupervised Model Selection in Domain Adaptation
Unsupervised domain adaptation (UDA) has been widely applied in improving model generalization on unlabeled target data. However, accurately selecting the best UDA model for the target domain is challenging due to the absence of labeled target data and domain distribution shifts. Traditional model selection approaches involve training extra models with source data to estimate the target validation risk. Recent studies propose practical methods that are based on measuring various properties of model predictions on target data. Although effective for some UDA models, these methods often lack stability and may lead to poor selections for other UDA models.In this paper, we present MixVal, an innovative model selection method that operates solely with unlabeled target data during inference. MixVal leverages mixed target samples with pseudo labels to directly probe the learned target structure by each UDA model.
On the Value of Target Data in Transfer Learning
We aim to understand the value of additional labeled or unlabeled target data in transfer learning, for any given amount of source data; this is motivated by practical questions around minimizing sampling costs, whereby, target data is usually harder or costlier to acquire than source data, but can yield better accuracy. To this aim, we establish the first minimax-rates in terms of both source and target sample sizes, and show that performance limits are captured by new notions of discrepancy between source and target, which we refer to as transfer exponents. Interestingly, we find that attaining minimax performance is akin to ignoring one of the source or target samples, provided distributional parameters were known a priori. Moreover, we show that practical decisions -- w.r.t.
Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data
We propose a learning problem involving adapting a pre-trained source model to the target domain for classifying all classes that appeared in the source data, using target data that covers only a partial label space. This problem is practical, as it is unrealistic for the target end-users to collect data for all classes prior to adaptation. However, it has received limited attention in the literature. To shed light on this issue, we construct benchmark datasets and conduct extensive experiments to uncover the inherent challenges. We found a dilemma --- on the one hand, adapting to the new target domain is important to claim better performance; on the other hand, we observe that preserving the classification accuracy of classes missing in the target adaptation data is highly challenging, let alone improving them. To tackle this, we identify two key directions: 1) disentangling domain gradients from classification gradients, and 2) preserving class relationships. We present several effective solutions that maintain the accuracy of the missing classes and enhance the overall performance, establishing solid baselines for holistic transfer of pre-trained models with partial target data.
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining.
ORIENT: Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift
Real-world machine-learning applications require robust models that generalize well to distribution shift settings, which is typical in real-world situations. Domain adaptation techniques aim to address this issue of distribution shift by minimizing the disparities between domains to ensure that the model trained on the source domain performs well on the target domain. Nevertheless, the existing domain adaptation methods are computationally very expensive. In this work, we aim to improve the efficiency of existing supervised domain adaptation (SDA) methods by using a subset of source data that is similar to target data for faster model training. Specifically, we propose ORIENT, a subset selection framework that uses the submodular mutual information (SMI) functions to select a source data subset similar to the target data for faster training. Additionally, we demonstrate how existing robust subset selection strategies, such as GLISTER, GRADMATCH, and CRAIG, when used with a held-out query set, fit within our proposed framework and demonstrate the connections with them. Finally, we empirically demonstrate that SDA approaches like d-SNE, CCSA, and standard Cross-entropy training, when employed together with ORIENT, achieve a) faster training and b) better performance on the target data.
Improved Regularization and Robustness for Fine-tuning in Neural Networks
A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. When the capacity of the pre-trained model is much larger than the size of the target data set, fine-tuning is prone to overfitting and memorizing the training labels. Hence, an important question is to regularize fine-tuning and ensure its robustness to noise. To address this question, we begin by analyzing the generalization properties of fine-tuning. We present a PAC-Bayes generalization bound that depends on the distance traveled in each layer during fine-tuning and the noise stability of the fine-tuned model.