distributionally robust optimization and generalization
Distributionally Robust Optimization and Generalization in Kernel Methods
Distributionally robust optimization (DRO) has attracted attention in machine learning due to its connections to regularization, generalization, and robustness. Existing work has considered uncertainty sets based on phi-divergences and Wasserstein distances, each of which have drawbacks. In this paper, we study DRO with uncertainty sets measured via maximum mean discrepancy (MMD). We show that MMD DRO is roughly equivalent to regularization by the Hilbert norm and, as a byproduct, reveal deep connections to classic results in statistical learning. In particular, we obtain an alternative proof of a generalization bound for Gaussian kernel ridge regression via a DRO lense. The proof also suggests a new regularizer. Our results apply beyond kernel methods: we derive a generically applicable approximation of MMD DRO, and show that it generalizes recent work on variance-based regularization.
Reviews: Distributionally Robust Optimization and Generalization in Kernel Methods
I raised my score from 4 to 6 after reading the author's feedback, mainly due to the novelty of the framework. However, I would expect the author can provide a thorough discussion of the limitation of the result in the camera-ready version. Weakness: Due to the intractbility of the MMD DRO problem, the submission did not find an exact reformulation as much other literature in DRO did for other probability metrics. Instead, the author provides several layers of approximation. The reason why I emphasize the importance of a tight bound, if not an exact reformulation, is that one of the major criticism about (distributionally) robust optimization is that it is sometimes too conservative, and thus a loose upper bound might not be sufficient to mitigate the over-conservativeness and demonstrate the power of distributionally robust optimization. When a new distance is introduced into the DRO framework, a natural question is why it should be used compared with other existing approaches.
Reviews: Distributionally Robust Optimization and Generalization in Kernel Methods
After thorough discussions among the area chair and reviewers, we concur that, albeit there remain several open questions, the paper provides a substantial contribution at the intersection of DRO and ML. Since the DRO has been neglected by the ML community despite its relevance in many ML applications, this work could potentially stimulate future work along this direction. Hence, I recommend that the paper gets accepted for publication at NeurIPS. Nevertheless, I would urge the authors, in the camera-ready version, to be candid about the limitations of their analysis and the need for future work. For example, the authors should explicitly mention the limitations of the loose upper bound in Theorem 3.1 as well as the fact that the constant M in Corollary 3.1 often depends on the dimension which is suboptimal.
Distributionally Robust Optimization and Generalization in Kernel Methods
Distributionally robust optimization (DRO) has attracted attention in machine learning due to its connections to regularization, generalization, and robustness. Existing work has considered uncertainty sets based on phi-divergences and Wasserstein distances, each of which have drawbacks. In this paper, we study DRO with uncertainty sets measured via maximum mean discrepancy (MMD). We show that MMD DRO is roughly equivalent to regularization by the Hilbert norm and, as a byproduct, reveal deep connections to classic results in statistical learning. In particular, we obtain an alternative proof of a generalization bound for Gaussian kernel ridge regression via a DRO lense.
Distributionally Robust Optimization and Generalization in Kernel Methods
Staib, Matthew, Jegelka, Stefanie
Distributionally robust optimization (DRO) has attracted attention in machine learning due to its connections to regularization, generalization, and robustness. Existing work has considered uncertainty sets based on phi-divergences and Wasserstein distances, each of which have drawbacks. In this paper, we study DRO with uncertainty sets measured via maximum mean discrepancy (MMD). We show that MMD DRO is roughly equivalent to regularization by the Hilbert norm and, as a byproduct, reveal deep connections to classic results in statistical learning. In particular, we obtain an alternative proof of a generalization bound for Gaussian kernel ridge regression via a DRO lense.