Goto

Collaborating Authors

 inverse dynamic model


36526ff8f18e4654cf95acd81921e00b-Paper-Conference.pdf

Neural Information Processing Systems

Effective trajectory stitching for long-horizon planning is a significant challenge in robotic decision-making. While diffusion models have shown promise in planning, they are limited to solving tasks similar to those seen in their training data. We propose CompDiffuser, a novel generative approach that can solve new tasks by learning to compositionally stitch together shorter trajectory chunks from previously seen tasks. Our key insight is modeling the trajectory distribution by subdividing it into overlapping chunks and learning their conditional relationships through a single bidirectional diffusion model. This allows information to propagate between segments during generation, ensuring physically consistent connections. We conduct experiments on benchmark tasks of various difficulties, covering different environment sizes, agent state dimension, trajectory types, training data quality, and show that CompDiffuser significantly outperforms existing methods.


AMore Discussion

Neural Information Processing Systems

Why One-step and IQL are imitation-based methods? The core difference between RL-based and imitation-based methods is that RL-based methods learn a value function of policy ฯ€ while imitation-based methods don't. Learning the value function of ฯ€ requires off-policy evaluation of ฯ€ (i.e., learning Qฯ€ or Vฯ€), which is prone to distribution shift. The policy evaluation and policy improvement will also affect each other as they are coupled. Imitation-based methods don't learn Qฯ€ or Vฯ€, but some of them do learn a value function.