Optimization
2 Preliminaries Wefirstintroduce theproblem setting andthelocal Bayesianoptimization framework. Weaimto numericallysolveoptimizationproblemsoftheform: givenx0 D,findx =argmin
We demonstrate that, surprisingly, the expected value ofthegradient isnotalwaysthedirection maximizing theprobability ofdescent, and in fact, these directions may be nearly orthogonal. This observation then inspires an elegant optimization scheme seeking to maximize the probability of descent while moving in the direction of most-probable descent.
LearningtoConstrainPolicyOptimizationwith VirtualTrustRegion
ComparedtoDeepQ-learning,deeppolicygradient (PG) methods are often more flexible and applicable to discrete and continuous action problems. However, these methods tend to suffer from high sample complexity and training instability since the gradient may not accurately reflect the policy gain when the policy changes substantially [6].