Gradients should stay on Path: Better Estimators of the Reverse- and Forward KL Divergence for Normalizing Flows