Oceania
Residual Relaxation for Multi-view Representation Learning Yifei Wang
Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.
Learning to Constrain Policy Optimization with Virtual Trust Region
We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory of past policies, providing a new capability for dynamically learning appropriate virtual trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.
A Proof of Theorem 2
Each batch contains around 32K tokens. All the experiments are done on either 4 NVIDIA A100 or 4 NVIDIA V100. We analyze the effect of the sizes of parallel data in Figure 4. Our approach consistently outperforms We demonstrate several cases from the generation of different models. Table 3: Examples of generated dialogue responses. Context We can make shipment within one month from receipt of order.
Coarse-to-fine Animal Pose and Shape Estimation: Supplementary Material
We compare our coarse-to-fine approach with the test-time optimization approach. The refinement stage of our approach relies on the output of the coarse estimation stage as an initial point. We test the sensitivity of our model to the first stage results by adding Gaussian noise to the SMAL and camera parameters estimated from the coarse estimation stage, respectively. We show more qualitative results in Figure 1. Table 2: Adding Gaussian noise to the estimated SMAL parameters (a) and camera parameter (b).