Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

Open in new window