A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation
Chen, Minghao, Gao, Zepeng, Zhao, Shuai, Qiu, Qibo, Wang, Wenxiao, Lin, Binbin, He, Xiaofei
–arXiv.org Artificial Intelligence
Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we leverage our metric to automatically search for the optimal set of hyper-parameters, achieving superior performance comparable to manually tuned sets across four common benchmarks. Deep neural networks, when trained on extensive datasets, have demonstrated exceptional performance across various computer vision tasks such as classification Liu et al. (2022); Radford et al. (2021), object detection Carion et al. (2020); Zhang et al. (2022), and semantic segmentation Chen et al. (2018); Xie et al. (2021). But performance in specific domains can always be enhanced through fine-tuning with labels. The challenge arises in real-world applications where manually labeling ample data for fine-tuning is both costly and impractical.
arXiv.org Artificial Intelligence
Sep-18-2023