Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator

Open in new window