8fdd149fcaa7058caccc9c4ad5b0d89a-AuthorFeedback.pdf

Neural Information Processing Systems 

Point#6clarifiesquestionsin"Correctness".3 1. Aregraphsnecessary? (Q1-2,Q4)The departing point of our work is the realization that an imitating policyis4 generally underdetermined by the observational data alone. For concreteness, consider modelsM1,M2, unknown5 to researchers, where inM1, X U, Y X; inM2, X U, Y X U; inMi,i = 1,2, P(U = 0) =6 P(U = 1) = 0.5. We assume thatY,U are unobserved;Y is the reward. Havingsaidthat,28 our methods could certainly be combined with GAIL to ensure both the causal robustness and the scalability with29 high-dimensional data, which we'llacknowledge inthepaper. R2: (1) A causal diagram containing latent rewardY generalize the traditional settings of imitation learning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found