Contrastive Variational Reinforcement Learning for Complex Observations
Ma, Xiao, Chen, Siwei, Hsu, David, Lee, Wee Sun
Model-free reinforcement learning (MFRL) has achieved great success in game playing [1, 2], robot navigation [3, 4] and etc. However, extending existing RL methods to real-world environments remains challenging, because they require long-horizon reasoning with the low-dimensional useful features, e.g., the position of a robot, embedded in high-dimensional complex observations, e.g., visually rich images. Consider a four-legged mini-cheetah robot [5] navigating on the campus. To determine the traversable path, the robot must extract the relevant geometric features that coexist with irrelevant variable backgrounds, such as the moving pedestrians, paintings on the wall, etc. Model-based RL (MBRL), in contrast to the model-free methods, reasons a world model trained by generative learning and greatly improves the sample efficiency of the model-free methods [6, 7, 8]. Recent MBRL methods learn compact latent world models from high-dimensional visual inputs with Variational Autoencoders (VAEs) [9] by optimizing the evidence lower bound (ELBO) of an observation sequence [10, 11]. However, learning a generative model under complex observations is challenging.
Nov-9-2020
- Country:
- Asia > Singapore (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Genre:
- Research Report (1.00)
- Technology: