dispo
- Europe > Austria (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- Research Report > Experimental Study (1.00)
- Workflow (0.67)
Distributional Successor Features Enable Zero-Shot Policy Optimization
Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features within the dataset. By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions. We present a practical instantiation of DiSPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.
- Europe > Austria (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- Research Report > Experimental Study (1.00)
- Workflow (0.67)
Distributional Successor Features Enable Zero-Shot Policy Optimization
Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging.
DiSPo: Diffusion-SSM based Policy Learning for Coarse-to-Fine Action Discretization
Oh, Nayoung, Jung, Moonkyeong, Park, Daehyung
Abstract-- We aim to solve the problem of generating coarseto-fine skills learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations with external interpolations or dynamics models with limited generalization capabilities. For memory-efficient learning and convenient granularity change, we propose a novel diffusion-SSM based policy (DiSPo) that learns from diverse coarse skills and produces varying control scales of actions by leveraging a state-space model, Mamba. Our evaluations show the adoption of Mamba and the proposed step-scaling method enables DiSPo to outperform in five coarseto-fine benchmark tests while DiSPo shows decent performance in typical fine-grained motion learning and reproduction. We finally demonstrate the scalability of actions with simulation and real-world manipulation tasks. In typical object manipulation, small imprecision around local regions often leads to the failure of entire tasks, such Figure 1: A capture of a square-drawing task that requires as robot welding, screwing, and drawing, as shown in Figure 1.