Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning