Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering

Allbert, Rumi A., Wiles, James K., Grankovsky, Vlad

Jan-10-2025–arXiv.org Artificial Intelligence

Large language models have been developed with ongoing efforts to improve their functionality, comprehend their internal workings, and guarantee their ethical and safe application. New developments in the field have led to the concept of'activation engineering'[13], which posits that activation vectors can mediate particular behaviors within LLMs. This development has made it possible to adjust and regulate the output of these models in new ways. This paper is motivated by the potential to extend this line of inquiry into the domain of personality traits in LLMs. The ability to dynamically adjust the personality of a language model without extensive retraining could mark a significant advancement in the field, offering improved flexibility in AI applications. This approach could potentially revolutionize how we interact with and deploy AI systems, allowing for more personalized and context-appropriate responses.

large language model, machine learning, personality trait, (18 more...)

arXiv.org Artificial Intelligence

Jan-10-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found