RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models