Reviews: Zap Q-Learning
–Neural Information Processing Systems
The paper proposes a variant of Q-learning, called Zap Q-learning, that is more stable than its precursor. Specifically, the authors show that, in the tabular case, their method minimises the asymptotic covariance of the parameter vector by applying approximate second-order updates based on the stochastic Newton-Raphson method. The behaviour of the algorithm is analised for the particular case of a tabular representation and experiments are presented showing the empirical performance of the method in its most general form. This is an interesting paper that addresses a core issue in RL. I have some comments regarding both its content and its presentation.
Neural Information Processing Systems
Jan-20-2025, 04:49:48 GMT
- Technology: