Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering
Allbert, Rumi A., Wiles, James K., Grankovsky, Vlad
–arXiv.org Artificial Intelligence
Large language models have been developed with ongoing efforts to improve their functionality, comprehend their internal workings, and guarantee their ethical and safe application. New developments in the field have led to the concept of'activation engineering'[13], which posits that activation vectors can mediate particular behaviors within LLMs. This development has made it possible to adjust and regulate the output of these models in new ways. This paper is motivated by the potential to extend this line of inquiry into the domain of personality traits in LLMs. The ability to dynamically adjust the personality of a language model without extensive retraining could mark a significant advancement in the field, offering improved flexibility in AI applications. This approach could potentially revolutionize how we interact with and deploy AI systems, allowing for more personalized and context-appropriate responses.
arXiv.org Artificial Intelligence
Jan-10-2025