offline rl algorithm
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.
Offline Behavior Distillation
Inspired by dataset distillation (DD) [Wang et al., 2018, Zhao et al., (Corollary 1). Extensive experiments on nine datasets of D4RL benchmark [Fu et al., 2020] with multiple environments and data qualities illustrate that our Av-PBC remarkably promotes the OBD performance, Moreover, Av-PBC has a significant convergence speed and requires only a quarter of distillation steps compared to DBC and PBC.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.93)
- Telecommunications (0.68)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Robots (0.67)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Asia > China > Tianjin Province > Tianjin (0.04)
- North America > United States > Montana (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report (0.69)
- Instructional Material (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
- Europe > United Kingdom (0.04)
- Europe > France (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Research Report > New Finding (1.00)
- Personal (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)