Online Statistical Inference for Time-varying Sample-averaged Q-learning

Panda, Saunak Kumar, Liu, Ruiqi, Xiang, Yisha

Oct-14-2024–arXiv.org Machine Learning

Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a time-varying batch-averaged Q-learning algorithm, termed sampleaveraged Q-learning, which improves upon traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Numerical experiments conducted on classic OpenAI Gym environments show that the time-varying sample-averaged Q-learning method consistently outperforms both single-sample and constant-batch Q-learning methods, achieving superior accuracy while maintaining comparable learning speeds.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

Oct-14-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East (0.14)
- North America > United States
  - Texas (0.28)

Genre:
- Research Report (0.63)

Industry:
- Banking & Finance > Trading (0.46)
- Energy > Oil & Gas
  - Upstream (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)