Review for NeurIPS paper: Policy Improvement via Imitation of Multiple Oracles

Jun-2-2025, 12:03:25 GMT–Neural Information Processing Systems

Weaknesses: Highest priority comments are the P0 comments listed below. P0: - I think you should clarify what you mean by "experts". You are allowing the definition of experts to include sub-optimal policies, but is there an extent to which you are allowing them to be suboptimal? I feel like this needs to be clarified. If they can be any policy, then does this not fall more in the domain of off-policy/batch RL, rather than imitation learning.

multiple oracle, neurips paper, policy improvement, (14 more...)

Neural Information Processing Systems

Jun-2-2025, 12:03:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.63)