CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
–Neural Information Processing Systems
Large language models (LLMs) are increasingly deployed in medical contexts, raising critical concerns about safety, alignment, and susceptibility to adversarial manipulation. While prior benchmarks assess model refusal capabilities for harmful prompts, they often lack clinical specificity, graded harmfulness levels, and coverage of jailbreak-style attacks. We introduce CARES (Clinical Adversarial Robustness and Evaluation of Safety), a benchmark for evaluating LLM safety in healthcare. CARES includes over 18,000 prompts spanning eight medical safety principles, four harm levels, and four prompting styles: direct, indirect, obfuscated, and role-play, to simulate both malicious and benign use cases.
Neural Information Processing Systems
Jun-15-2026, 10:42:43 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Media (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Banking & Finance (1.00)
- Law > Statutes (0.67)
- Health & Medicine
- Technology: