The Better Angels of Machine Personality: How Personality Relates to LLM Safety

Zhang, Jie, Liu, Dongrui, Qian, Chen, Gan, Ziyue, Liu, Yong, Qiao, Yu, Shao, Jing

Jul-17-2024–arXiv.org Artificial Intelligence

Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxicity, privacy, and fairness, based on the reliable MBTI-M scale. Meanwhile, the safety alignment generally increases various LLMs' Extraversion, Sensing, and Judging traits. According to such findings, we can edit LLMs' personality traits and improve their safety performance, e.g., inducing personality from ISTJ to ISTP resulted in a relative improvement of approximately 43% and 10% in privacy and fairness performance, respectively. Additionally, we find that LLMs with different personality traits are differentially susceptible to jailbreak.

large language model, machine learning, personality trait, (17 more...)

arXiv.org Artificial Intelligence

Jul-17-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.28)
  - Middle East > UAE (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.67)
- Health & Medicine (0.68)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found