Lessons from Training Grounded LLMs with Verifiable Rewards