Does GPT-3 Demonstrate Psychopathy? Evaluating Large Language Models from a Psychological Perspective

Li, Xingxuan, Li, Yutong, Joty, Shafiq, Liu, Linlin, Huang, Fei, Qiu, Lin, Bing, Lidong

arXiv.org Artificial Intelligence 

In this work, we determined whether large language models (LLMs) are psychologically safe. We designed unbiased prompts to systematically evaluate LLMs from a psychological perspective. First, we tested three different LLMs by using two personality tests: Short Dark Triad (SD-3) and Big Five Inventory (BFI). All models scored higher than the human average on SD-3, suggesting a relatively darker personality pattern. Despite being instruction fine-tuned with safety metrics to reduce toxicity, InstructGPT and FLAN-T5 still showed implicit dark personality patterns; both models scored higher than self-supervised GPT-3 on the Machiavellianism and narcissism traits on SD-3. Then, we evaluated the LLMs in the GPT-3 series by using well-being tests to study the impact of fine-tuning with more training data. We observed a continuous increase in the well-being scores of GPT-3 and InstructGPT. Following these observations, we showed that instruction fine-tuning FLAN-T5 with positive answers from BFI could effectively improve the model from a psychological perspective. On the basis of the findings, we recommended the application of more systematic and comprehensive psychological metrics to further evaluate and improve the safety of LLMs.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found