aims to match the state-action distributions between the learner and the
–Neural Information Processing Systems
Thank reviewers for the comments. Please find our responses below, with reference indices consistent with the paper. Q3-5: Meaning of the learned divergence? We agree that BC minimizes the policy KL divergence as what we noted in Sec. 4 (line 200). It is consistent with the literature, e.g., Sec. 2 in [Yu et al. arXiv:1909.09314].
Neural Information Processing Systems
May-30-2025, 08:21:45 GMT
- Technology: