Stochastic Approximation Approaches to Group Distributionally Robust Optimization

Neural Information Processing Systems 

This paper investigates group distributionally robust optimization (GDRO), with the purpose to learn a model that performs well over m different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, and demonstrate that stochastic mirror descent (SMD), using m samples in each iteration, achieves an O(m (\log m)/\epsilon 2) sample complexity for finding an \epsilon -optimal solution, which matches the \Omega(m/\epsilon 2) lower bound up to a logarithmic factor. Then, we make use of techniques from online learning to reduce the number of samples required in each round from m to 1, keeping the same sample complexity. Specifically, we cast GDRO as a two-players game where one player simply performs SMD and the other executes an online algorithm for non-oblivious multi-armed bandits. Next, we consider a more practical scenario where the number of samples that can be drawn from each distribution is different, and propose a novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates. In the first approach, we incorporate non-uniform sampling into SMD such that the sample budget is satisfied in expectation, and prove that the excess risk of the i -th distribution decreases at an O(\sqrt{n_1 \log m}/n_i) rate.