A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Xie, Tengyang, Liu, Bo, Xu, Yangyang, Ghavamzadeh, Mohammad, Chow, Yinlam, Lyu, Daoming, Yoon, Daesub

Neural Information Processing Systems 

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm.