Review for NeurIPS paper: Policy Improvement via Imitation of Multiple Oracles
–Neural Information Processing Systems
Weaknesses: Highest priority comments are the P0 comments listed below. P0: - I think you should clarify what you mean by "experts". You are allowing the definition of experts to include sub-optimal policies, but is there an extent to which you are allowing them to be suboptimal? I feel like this needs to be clarified. If they can be any policy, then does this not fall more in the domain of off-policy/batch RL, rather than imitation learning.
Neural Information Processing Systems
Jun-2-2025, 12:03:25 GMT
- Technology: