Improved Complexities for Stochastic Conditional Gradient Methods under Interpolation-like Conditions
Xiao, Tesi, Balasubramanian, Krishnakumar, Ghadimi, Saeed
In a machine learning setup, the function F could be interpreted as the loss function associated with a sample ξ and the function f could represent the risk, which is defined as the expected loss. Such constrained stochastic optimization problems arise frequently in statistical machine learning applications. Conditional gradient algorithm, also called as Frank-Wolfe algorithm, is an efficient method for solving constrained optimization problems of the form in (1) due to their projection-free nature [Jag13, HJN15, FGM17, LPZZ17, BZK18, RDLS18]. In each step of the conditional gradient method, it is only required to minimize a linear objective over the set Ω. This operation could be implemented efficiently for a variety of sets arising in statistical machine learning, compared to the operation of projecting on to the set Ω, which is required for example by the projected gradient method. Hence, conditional gradient method has regained popularity in the last decade in the optimization and machine learning community.
Jun-15-2020