When Do Off-Policy and On-Policy Policy Gradient Methods Align?