KL-Regularized Reinforcement Learning is Designed to Mode Collapse

Open in new window