Goto

Collaborating Authors

 Reinforcement Learning


Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

Neural Information Processing Systems

We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation.




Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Neural Information Processing Systems

However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we study the fine-tuning problem in the context of conservative offline RL methods and we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities.