How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis
Dong, Yuxin, Gong, Tieliang, Chen, Hong, Song, Shuangyong, Zhang, Weizhan, Li, Chen
–arXiv.org Artificial Intelligence
Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution matching. In this work, we formulate domain generalization from a novel probabilistic perspective, ensuring robustness while avoiding overly conservative solutions. Through comprehensive information-theoretic analysis, we provide key insights into the roles of gradient and representation matching in promoting generalization. Our results reveal the complementary relationship between these two components, indicating that existing works focusing solely on either gradient or representation alignment are insufficient to solve the domain generalization problem. In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations. Integrated with the proposed PDM method for complex distribution matching, IDM achieves superior performance over various baseline methods. ISTRIBUTION shifts are prevalent in various real-world learning contexts, often leading to machine learning systems overfitting environment-specific correlations that may negatively impact performance when facing out-of-distribution (OOD) data [1]-[4]. Domain generalization (DG) is then introduced to address this challenge: By assuming the training data constitutes multiple domains that share some invariant underlying correlations, DG algorithms then attempt to learn this invariance so that domain-specific variations do not affect the model's performance. To this end, various DG approaches have been proposed, including invariant representation learning [5], [6], adversarial learning [7], [8], causal inference [9], [10], gradient manipulation [11]-[13], and robust optimization [14]-[16]. DG is typically formulated as an average-case [17], [18] or worst-case [9], [14] optimization problem, which however either lacks robustness against OOD data [9], [19] or leads to overly conservative solutions [16]. In this paper, we introduce a novel probabilistic formulation that aims to minimize the gap between training and test-domain population risks with high probability.
arXiv.org Artificial Intelligence
Jun-14-2024
- Country:
- Asia > China
- Beijing > Beijing (0.04)
- Hubei Province > Wuhan (0.04)
- Shaanxi Province > Xi'an (0.04)
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- Asia > China
- Genre:
- Research Report > New Finding (0.34)
- Technology: