Goto

Collaborating Authors

 dispo



Distributional Successor Features Enable Zero-Shot Policy Optimization

Neural Information Processing Systems

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features within the dataset. By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions. We present a practical instantiation of DiSPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.



Distributional Successor Features Enable Zero-Shot Policy Optimization

Neural Information Processing Systems

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging.


DiSPo: Diffusion-SSM based Policy Learning for Coarse-to-Fine Action Discretization

Oh, Nayoung, Jung, Moonkyeong, Park, Daehyung

arXiv.org Artificial Intelligence

Abstract-- We aim to solve the problem of generating coarseto-fine skills learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations with external interpolations or dynamics models with limited generalization capabilities. For memory-efficient learning and convenient granularity change, we propose a novel diffusion-SSM based policy (DiSPo) that learns from diverse coarse skills and produces varying control scales of actions by leveraging a state-space model, Mamba. Our evaluations show the adoption of Mamba and the proposed step-scaling method enables DiSPo to outperform in five coarseto-fine benchmark tests while DiSPo shows decent performance in typical fine-grained motion learning and reproduction. We finally demonstrate the scalability of actions with simulation and real-world manipulation tasks. In typical object manipulation, small imprecision around local regions often leads to the failure of entire tasks, such Figure 1: A capture of a square-drawing task that requires as robot welding, screwing, and drawing, as shown in Figure 1.