Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization Paul Barde

Neural Information Processing Systems 

Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found