Rethinking On-policy Optimization for Query Augmentation