Provable Partially Observable Reinforcement Learning with Privileged Information

May-27-2025, 05:33:21 GMT–Neural Information Processing Systems

Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL). In practice, certain privileged information, e.g., the access to states from simulators, has been exploited in training and achieved prominent empirical successes. To better understand the benefits of privileged information, we revisit and examine several simple and practically used paradigms in this setting, with both computation and sample efficiency analyses. Specifically, we first formalize the empirical paradigm of expert distillation (also known as teacher-student learning), demonstrating its pitfall in finding near-optimal policies. We then identify a condition of the partially observable environment, the deterministic filter condition, under which expert distillation achieves sample and computational complexities that are both polynomial.

computational complexity, observable reinforcement learning, paradigm, (6 more...)

Neural Information Processing Systems

May-27-2025, 05:33:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.64)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.38)