Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
Vincent, Théo, Tripathi, Yogesh, Faust, Tim, Oren, Yaniv, Peters, Jan, D'Eramo, Carlo
–arXiv.org Artificial Intelligence
The use of target networks in deep reinforcement learning is a widely popular solution to mitigate the brittleness of semi-gradient approaches and stabilize learning. However, target networks notoriously require additional memory and delay the propagation of Bellman updates compared to an ideal target-free approach. In this work, we step out of the binary choice between target-free and target-based algorithms. We introduce a new method that uses a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network. This simple modification enables us to keep the target-free's low-memory footprint while leveraging the target-based literature. We find that combining our approach with the concept of iterated Q-learning, which consists of learning consecutive Bellman updates in parallel, helps improve the sample-efficiency of target-free approaches. Our proposed method, iterated Shared Q-Learning (iS-QL), bridges the performance gap between target-free and target-based approaches across various problems, while using a single Q-network, thus being a step forward towards resource-efficient reinforcement learning algorithms.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- Europe
- Denmark > Southern Denmark (0.04)
- Germany
- Bavaria > Lower Franconia
- Würzburg (0.04)
- Hesse > Darmstadt Region
- Darmstadt (0.04)
- Bavaria > Lower Franconia
- Netherlands > South Holland
- Delft (0.04)
- Europe
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.68)
- Leisure & Entertainment > Games
- Computer Games (0.31)
- Technology: