Review for NeurIPS paper: Novelty Search in Representational Space for Sample Efficient Exploration
–Neural Information Processing Systems
Additional Feedback: The method seems to be restricted to deterministic environments. Could we add a bit of discussion why it would be the case and how we could imagine to extend the approach to deal with stochastic environments (maybe in the supplementary material)? In most approaches, the discount factor is an exponential function of the distance in time, why did the authors choose to make it a function of state and action, and why should we learn it? Having the environment return the discount factor is not really common. The choice of the learned representation size seems to contain some domain knowledge.
Neural Information Processing Systems
Jan-24-2025, 18:42:52 GMT
- Technology: