Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization Paul Barde
–Neural Information Processing Systems
Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy.
Neural Information Processing Systems
Aug-15-2025, 03:08:36 GMT