Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach

Donâncio, Henrique, Barrier, Antoine, South, Leah F., Forbes, Florence

Oct-16-2024–arXiv.org Artificial Intelligence

Reinforcement Learning (RL), when combined with function approximators such as Artificial Neural Networks (ANNs), has shown success in learning policies that outperform humans in complex games by leveraging extensive datasets (see, e.g., 33, 19, 39, 40). While ANNs were previously used as value function approximators [29], the introduction of Deep Q-Networks (DQN) by [24, 25] marked a significant breakthrough by improving learning stability through two mechanisms: the target network and experience replay. The experience replay (see 22) stores the agent's interactions within the environment, allowing sampling of past interactions in a random way that disrupts their correlation. The target network further stabilizes the learning process by periodically copying the parameters of the learning network. This strategy is crucial because the Bellman update --using estimations to update other estimations-- would otherwise occur using the same network, potentially causing divergence. By leveraging the target network, gradient steps are directed towards a periodically fixed target, ensuring more stability in the learning process. Additionally, the learning rate hyperparameter controls the magnitude of these gradient steps in optimizers such as the stochastic gradient descent algorithm, affecting the training convergence. The learning rate is one of the most important hyperparameters, with previous work demonstrating that decreasing its value during policy finetuning can enhance performance by up to 25% in vanilla DQN [3].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Oct-16-2024

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.93)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment > Games (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning > Gradient Descent (0.55)