Correcting discount-factor mismatch in on-policy policy gradient methods

Open in new window