Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution

Stambaugh, Crimson, Rao, Rajesh P. N.

arXiv.org Artificial Intelligence 

Training a policy with online rollouts can be costly, dangerous, and sample-inefficient [1]. Alternatively, offline reinforcement learning (RL) involves a policy trained exclusively with pre-collected data. Extracting effective polices without exploration or feedback from the environment is challenging for conventional off-policy and even specialized offline RL algorithms [2, 3]. Approaches to of-fline RL are also frequently faced with the problem of incomplete or undirected demonstrations [4, 5, 6]. Offline algorithms must compose sub-trajectories from training data to generate advantageous behaviors. Another challenge is high-dimensionality and long horizons, which make accurate planning and behavior cloning difficult [1]. Finally, sparse rewards pose a challenge to many training algorithms as they hinder accurate credit assignment to actions [7]. Diffusion models have emerged as a powerful framework for expressing complex, multi-modal distributions [8, 9]. Leveraging this model class, diffusion policies generate high fidelity actions and use a value function for action selection [10, 11, 12].