Token-level Proximal Policy Optimization for Query Generation