Review for NeurIPS paper: Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Neural Information Processing Systems 

Strengths: To the best of my knowledge, the BSGD algorithm is the first stochastic-gradient based algorithm that directly solves CSO problem itself. The two most relevant work that focus on CSO are [12] and [24]; [12] solves a saddle-point problem reformulation of CSO, while [24] resorts to providing sample complexities for SAA approach to solve general CSO problem. With respect to the SAA approach presented in [24], BSGD method improves in sample complexities (they remove the dependence on d) when F is general convex, matching the lower bounds they provide. Although BSGD is not optimal when F is strongly convex and smooth, it matches the complexities of SAA approach[24]. They also argue about the settings in which BSGD may not be optimal, providing a transparent evaluation of their algorithm.