PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach