aims to match the state-action distributions between the learner and the

Neural Information Processing Systems 

Thank reviewers for the comments. Please find our responses below, with reference indices consistent with the paper. Q3-5: Meaning of the learned divergence? We agree that BC minimizes the policy KL divergence as what we noted in Sec. 4 (line 200). It is consistent with the literature, e.g., Sec. 2 in [Yu et al. arXiv:1909.09314].