mean-variance optimization
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Reviews: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
This work is deals with risk measure in RL/Planning, specifically, risk measures that are based on trajectory variance. Unlike the straight forward approach that is taken in previous works, such as Tamar et al. 2012, 2014 or 2; Prashanth and Ghavamzadeh, 2013, in this work multiple time scale stochastic approximation (MTSSA) is not used. The authors argue that MTSSA is hard to tune and has a slow convergence rate. Instead, the authors propose an approach which is use Block Coordinate Descent. This approach is based on coordinate descent where during the optimization process, not all the coordinates of the policy parameter are optimized, but only a subset of them.
Two-stage Modeling for Prediction with Confidence
The use of neural networks has been very successful in a wide variety of applications. However, it has recently been observed that it is difficult to generalize the performance of neural networks under the condition of distributional shift. Several efforts have been made to identify potential out-of-distribution inputs. Although existing literature has made significant progress with regard to images and textual data, finance has been overlooked. The aim of this paper is to investigate the distribution shift in the credit scoring problem, one of the most important applications of finance. For the potential distribution shift problem, we propose a novel two-stage model. Using the out-of-distribution detection method, data is first separated into confident and unconfident sets. As a second step, we utilize the domain knowledge with a mean-variance optimization in order to provide reliable bounds for unconfident samples. Using empirical results, we demonstrate that our model offers reliable predictions for the vast majority of datasets. It is only a small portion of the dataset that is inherently difficult to judge, and we leave them to the judgment of human beings. Based on the two-stage model, highly confident predictions have been made and potential risks associated with the model have been significantly reduced.
- Information Technology (0.95)
- Banking & Finance > Credit (0.91)
- Banking & Finance > Loans (0.68)
A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
Ma, Shuai, Ma, Xiaoteng, Xia, Li
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property -- time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Furthermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance
This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric considering both mean and variance of rewards together. Such performance metric is important since the mean indicates average returns and the variance indicates risk or fairness. However, the variance metric couples the rewards at all stages, the traditional dynamic programming is inapplicable as the principle of time consistency fails. We study this problem from a new perspective called the sensitivity-based optimization theory. A performance difference formula is derived and it can quantify the difference of the mean-variance combined metrics of MDPs under any two different policies. The difference formula can be utilized to generate new policies with strictly improved mean-variance performance. A necessary condition of the optimal policy and the optimality of deterministic policies are derived. We further develop an iterative algorithm with a form of policy iteration, which is proved to converge to local optima both in the mixed and randomized policy space. Specially, when the mean reward is constant in policies, the algorithm is guaranteed to converge to the global optimum. Finally, we apply our approach to study the fluctuation reduction of wind power in an energy storage system, which demonstrates the potential applicability of our optimization method.
- North America > United States > New York (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (10 more...)
- Energy > Renewable > Wind (0.88)
- Energy > Energy Storage (0.88)
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Xie, Tengyang, Liu, Bo, Xu, Yangyang, Ghavamzadeh, Mohammad, Chow, Yinlam, Lyu, Daoming, Yoon, Daesub
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm.