8fdd149fcaa7058caccc9c4ad5b0d89a-AuthorFeedback.pdf
–Neural Information Processing Systems
Point#6clarifiesquestionsin"Correctness".3 1. Aregraphsnecessary? (Q1-2,Q4)The departing point of our work is the realization that an imitating policyis4 generally underdetermined by the observational data alone. For concreteness, consider modelsM1,M2, unknown5 to researchers, where inM1, X U, Y X; inM2, X U, Y X U; inMi,i = 1,2, P(U = 0) =6 P(U = 1) = 0.5. We assume thatY,U are unobserved;Y is the reward. Havingsaidthat,28 our methods could certainly be combined with GAIL to ensure both the causal robustness and the scalability with29 high-dimensional data, which we'llacknowledge inthepaper. R2: (1) A causal diagram containing latent rewardY generalize the traditional settings of imitation learning.
Neural Information Processing Systems
Feb-9-2026, 08:24:47 GMT
- Technology: