Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference

Neural Information Processing Systems 

In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent variable to connect a Transformer- based trajectory generator and the final return. LPT can be learned with maximum likelihood estimation on trajectory-return pairs.