Optimization
Learning to Constrain Policy Optimization with Virtual Trust Region
We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory of past policies, providing a new capability for dynamically learning appropriate virtual trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.
Coarse-to-fine Animal Pose and Shape Estimation: Supplementary Material
We compare our coarse-to-fine approach with the test-time optimization approach. The refinement stage of our approach relies on the output of the coarse estimation stage as an initial point. We test the sensitivity of our model to the first stage results by adding Gaussian noise to the SMAL and camera parameters estimated from the coarse estimation stage, respectively. We show more qualitative results in Figure 1. Table 2: Adding Gaussian noise to the estimated SMAL parameters (a) and camera parameter (b).