Reviews: EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
–Neural Information Processing Systems
Review of submission 1489: EX2: Exploration with Exemplar Models for Deep Reinforcement Learning Summary: A discriminative novelty detection algorithm is proposed to improve exploration for policy gradient based reinforcement learning algorithms. The implicitly-estimated density by the discriminative novelty detection of a state is then used to produce a reward bonus added to the original reward for down-stream policy optimization algorithms (TRPO). Two techniques are discussed to improve the computation efficiency. Comments - One motivation of the paper is to utilize implicit density estimation to approximate classic count based exploration. The discriminative novelty detection only maintains a density estimation over the states, but not state-action pairs.
Neural Information Processing Systems
Oct-7-2024, 14:56:08 GMT
- Technology: