BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
–Neural Information Processing Systems
Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL.
Neural Information Processing Systems
Mar-22-2026, 13:58:38 GMT
- Technology: