RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning
–arXiv.org Artificial Intelligence
These learning (DRL) method utilizing the methods utilize the discounted reward criterion, which is average reward criterion. While most existing applicable to a variety of MDP-formulated tasks (Puterman, DRL methods employ the discounted reward criterion, 1994). In particular, for continuing tasks where there is this can potentially lead to a discrepancy no natural breakpoint in episodes, such as in robot locomotion between the training objective and performance (Todorov et al., 2012) or Access Control Queuing metrics in continuing tasks, making the average Tasks(Sutton & Barto, 2018), where the interaction between reward criterion a recommended alternative. We an agent and an environment can continue indefinitely, the introduce RVI-SAC, an extension of the state-ofthe-art discount rate plays a role in keeping the infinite horizon off-policy DRL method, Soft Actor-Critic return bounded. However, discounting introduces an undesirable (SAC) (Haarnoja et al., 2018a;b), to the average reward effect in continuing tasks by prioritizing rewards criterion. Our proposal consists of (1) Critic closer to the current time over those in the future. An approach updates based on RVI Q-learning (Abounadi et al., to mitigate this effect is to bring the discount rate 2001), (2) Actor updates introduced by the average closer to 1, but it is commonly known that a large discount reward soft policy improvement theorem, and rate can lead to instability and slower convergence(Fujimoto (3) automatic adjustment of Reset Cost enabling et al., 2018; Dewanto & Gallagher, 2021).
arXiv.org Artificial Intelligence
Aug-4-2024
- Country:
- Asia
- China (0.04)
- Japan > Honshū
- Kantō
- Kanagawa Prefecture > Yokohama (0.04)
- Tokyo Metropolis Prefecture > Tokyo (0.04)
- Kantō
- Middle East > Jordan (0.04)
- Europe
- North America > United States
- Massachusetts > Hampshire County
- Amherst (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts > Hampshire County
- Asia
- Genre:
- Research Report > New Finding (0.68)
- Technology: