User 1000 Model4o 4o MistralMistral LLaMALLaMA QwenQwen Safety: 5/5 ModelSafety: 2/5

Jun-19-2026, 14:10:31 GMT–Neural Information Processing Systems

Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics--such as factuality, bias, or toxicity--overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce "personalized safety" to fill this gap and present PENGUIN--a benchmark comprising 14,000scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safety enhancement. To address this, we develop RAISE--a training-free, two-stage agent framework that strategically acquires user-specific background. RAISE improves safety scores by up to 31.6%over six vanilla LLMs, while maintaining a low interaction cost of just 2.7 user queries on average. Our findings highlight the importance of selective information gathering in safety-critical domains and offer a practical solution for personalizing LLM responses without model retraining. This work establishes a foundation for safety research that adapts to individual user contexts rather than assuming a universal harm standard.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Jun-19-2026, 14:10:31 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- Asia (0.67)
- North America > United States
  - California (0.45)

Genre:
- Overview (0.92)
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Law (1.00)
- Education > Educational Setting (1.00)
- Banking & Finance (1.00)
- Government (0.92)
- Information Technology > Security & Privacy (0.87)
- Health & Medicine
  - Consumer Health (1.00)
  - Therapeutic Area
    - Psychiatry/Psychology > Mental Health (1.00)
    - Neurology (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found