Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
–Neural Information Processing Systems
This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks.
model-based offline reinforcement learning, optimal uniform ope, reward-free and task-agnostic, (4 more...)
Neural Information Processing Systems
Dec-24-2025, 06:26:10 GMT
- Technology: