Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space