Goto

Collaborating Authors

 auxiliary action


CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

arXiv.org Artificial Intelligence

Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions. Previous conservative offline RL algorithms struggle to generalize to unseen actions, despite their success in learning good in-distribution policy. In contrast, we propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions. We decouple the conservatism constraints from the policy, thus can benefit wide offline RL algorithms. As a consequence, we propose the Conservative Denoising Score-based Algorithm (CDSA) which utilizes the denoising score-based model to model the gradient of the dataset density, rather than the dataset density itself, and facilitates a more accurate and efficient method to adjust the action generated by the pre-trained policy in a deterministic and continuous MDP environment. In experiments, we show that our approach significantly improves the performance of baseline algorithms in D4RL datasets, and demonstrate the generalizability and plug-and-play capability of our model across different pre-trained offline RL policy in different tasks. We also validate that the agent exhibits greater risk aversion after employing our method while showcasing its ability to generalize effectively across diverse tasks.


Symbolic Leaf Representation in Decoupled Search

AAAI Conferences

Star-Topology Decoupled Search has recently been introduced in classical planning. It splits the planning task into a set of components whose dependencies take a star structure, where one center component interacts with possibly many leaf components. Here we address a weakness of decoupled search, namely large leaf components, whose state space is enumerated explicitly. We propose a symbolic representation of the leaf state spaces via decision diagrams, which can be dramatically smaller, and also more runtime efficient. We further introduce a symbolic version of the LM-cut heuristic, that nicely connects to our new leaf representation. We show empirically that the symbolic representation indeed pays off when the leaf components are large.