Guaranteed Trust Region Optimization via Two-Phase KL Penalization

Open in new window