Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization