SWAD: Domain Generalization by Seeking Flat Minima -- Appendix Kyungjae Lee 3

Neural Information Processing Systems 

In this study, we theoretically and empirically demonstrate that domain generalization (DG) is achievable by seeking flat minima, and propose SWAD to find flat minima. With SWAD, researchers and developers can make a model robust to domain shift in a real deployment environment, without relying on a task-dependent prior, a modified objective function, or a specific model architecture. Accordingly, SWAD has potential positive impacts by developing machines less biased towards ethical aspects, as well as potential negative impacts, e.g., improving weapon or surveillance systems under unexpected environment changes. B.1 Hyperparameters of SWAD The evaluation protocol by Gulrajani and Lopez-Paz [1] is computationally too expensive; it requires about 4,142 models for every DG algorithm. Hence, we reduce the search space of SWAD for computational efficiency; batch size and learning rate are set to 32 for each domain and 5e-5, respectively.