Review for NeurIPS paper: Policy Improvement via Imitation of Multiple Oracles

Neural Information Processing Systems 

Weaknesses: Highest priority comments are the P0 comments listed below. P0: - I think you should clarify what you mean by "experts". You are allowing the definition of experts to include sub-optimal policies, but is there an extent to which you are allowing them to be suboptimal? I feel like this needs to be clarified. If they can be any policy, then does this not fall more in the domain of off-policy/batch RL, rather than imitation learning.