Goto

Collaborating Authors

 Optimization



2 Preliminaries Wefirstintroduce theproblem setting andthelocal Bayesianoptimization framework. Weaimto numericallysolveoptimizationproblemsoftheform: givenx0 D,findx =argmin

Neural Information Processing Systems

We demonstrate that, surprisingly, the expected value ofthegradient isnotalwaysthedirection maximizing theprobability ofdescent, and in fact, these directions may be nearly orthogonal. This observation then inspires an elegant optimization scheme seeking to maximize the probability of descent while moving in the direction of most-probable descent.








LearningtoConstrainPolicyOptimizationwith VirtualTrustRegion

Neural Information Processing Systems

ComparedtoDeepQ-learning,deeppolicygradient (PG) methods are often more flexible and applicable to discrete and continuous action problems. However, these methods tend to suffer from high sample complexity and training instability since the gradient may not accurately reflect the policy gain when the policy changes substantially [6].