CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs

Jun-15-2026, 10:42:43 GMT–Neural Information Processing Systems

Large language models (LLMs) are increasingly deployed in medical contexts, raising critical concerns about safety, alignment, and susceptibility to adversarial manipulation. While prior benchmarks assess model refusal capabilities for harmful prompts, they often lack clinical specificity, graded harmfulness levels, and coverage of jailbreak-style attacks. We introduce CARES (Clinical Adversarial Robustness and Evaluation of Safety), a benchmark for evaluating LLM safety in healthcare. CARES includes over 18,000 prompts spanning eight medical safety principles, four harm levels, and four prompting styles: direct, indirect, obfuscated, and role-play, to simulate both malicious and benign use cases.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Jun-15-2026, 10:42:43 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Media (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Banking & Finance (1.00)
- Law > Statutes (0.67)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Health Care Providers & Services (0.94)
  - Health Care Technology > Telehealth (0.93)
  - Government Relations & Public Policy (0.93)
  - Therapeutic Area
    - Immunology (1.00)
    - Infections and Infectious Diseases (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found