Information asymmetry in KL-regularized RL