SIBRE: Self Improvement Based REwards for Adaptive Feedback in Reinforcement Learning
Nath, Somjit, Verma, Richa, Ray, Abhik, Khadilkar, Harshad
We propose a generic reward shaping approach for improving the Similar approaches appear to have worked in literature on container rate of convergence in reinforcement learning (RL), called Self loading [27] and railway scheduling [11] problems, without Improvement Based REwards, or SIBRE. The approach is designed being formally proposed or analysed. One study on bin packing for use in conjunction with any existing RL algorithm, and consists does propose reward shaping explicitly, and is described below. of rewarding improvement over the agent's own past performance. Literature on formal reward shaping: The proposed approach We prove that SIBRE converges in expectation under the same (SIBRE) falls under the category of reward shaping approaches conditions as the original RL algorithm. The reshaped rewards for RL, but with some key novelty points as described help discriminate between policies when the original rewards are below. Prior literature has shown that the optimal policy learnt weakly discriminated or sparse. Experiments on several well-known by RL remains invariant under reward shaping if the modification benchmark environments with different RL algorithms show that can be expressed as a potential function [15].
Dec-21-2020
- Country:
- North America > United States > Texas (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Games (0.68)
- Technology: