Optimization Issues in KL-Constrained Approximate Policy Iteration

Open in new window