Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning