$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Open in new window