Goto

Collaborating Authors

 inverse dynamic model


AMore Discussion

Neural Information Processing Systems

Why One-step and IQL are imitation-based methods? The core difference between RL-based and imitation-based methods is that RL-based methods learn a value function of policy ฯ€ while imitation-based methods don't. Learning the value function of ฯ€ requires off-policy evaluation of ฯ€ (i.e., learning Qฯ€ or Vฯ€), which is prone to distribution shift. The policy evaluation and policy improvement will also affect each other as they are coupled. Imitation-based methods don't learn Qฯ€ or Vฯ€, but some of them do learn a value function.










Latent Action World Models for Control with Unlabeled Trajectories

arXiv.org Artificial Intelligence

Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely on action-conditioned trajectories, which limits effectiveness when action labels are scarce. We introduce a family of latent-action world models that jointly use action-conditioned and action-free data by learning a shared latent action representation. This latent space aligns observed control signals with actions inferred from passive observations, enabling a single dynamics model to train on large-scale unlabeled trajectories while requiring only a small set of action-labeled ones. We use the latent-action world model to learn a latent-action policy through offline reinforcement learning (RL), thereby bridging two traditionally separate domains: offline RL, which typically relies on action-conditioned data, and action-free training, which is rarely used with subsequent RL. On the DeepMind Control Suite, our approach achieves strong performance while using about an order of magnitude fewer action-labeled samples than purely action-conditioned baselines. These results show that latent actions enable training on both passive and interactive data, which makes world models learn more efficiently.