Reviews: EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Neural Information Processing Systems 

Review of submission 1489: EX2: Exploration with Exemplar Models for Deep Reinforcement Learning Summary: A discriminative novelty detection algorithm is proposed to improve exploration for policy gradient based reinforcement learning algorithms. The implicitly-estimated density by the discriminative novelty detection of a state is then used to produce a reward bonus added to the original reward for down-stream policy optimization algorithms (TRPO). Two techniques are discussed to improve the computation efficiency. Comments - One motivation of the paper is to utilize implicit density estimation to approximate classic count based exploration. The discriminative novelty detection only maintains a density estimation over the states, but not state-action pairs.