Deqian Kong
–Neural Information Processing Systems
In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent variable to connect a Transformerbased trajectory generator and the final return. LPT can be learned with maximum likelihood estimation on trajectory-return pairs.
Neural Information Processing Systems
May-25-2025, 19:27:12 GMT