To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable RL

Jun-13-2026, 01:08:27 GMT–Neural Information Processing Systems

Partial observability is a notorious challenge in reinforcement learning (RL), due to the need to learn complex, history-dependent policies. Recent empirical successes have used -- which leverages availability of latent state information during training (e.g., from a simulator) to learn and imitate the optimal latent, Markovian policy -- to disentangle the task of ''learning to see'' from ''learning to act''. While expert distillation is more computationally efficient than RL without latent state information, it also has well-documented failure modes. In this paper -- through a simple but instructive theoretical model called the, and controlled experiments on challenging simulated locomotion tasks -- we investigate the algorithmic trade-off between privileged expert distillation and standard RL without privileged information.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Jun-13-2026, 01:08:27 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.57)