Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling