Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Dec-24-2025, 06:26:10 GMT–Neural Information Processing Systems

This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks.

model-based offline reinforcement learning, optimal uniform ope, reward-free and task-agnostic, (4 more...)

Neural Information Processing Systems

Dec-24-2025, 06:26:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)