Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning