Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models

Open in new window