Can Large Language Models Change User Preference Adversarially?

Subhash, Varshini

arXiv.org Artificial Intelligence 

As pretrained large language models become larger in size and capabilities, it becomes increasingly important to ensure safety in their role in society and deployment in high-stakes situations. For instance, ChatGPT is a preview of the future of personal dialogue assistants and interpreting and explaining such models has become critical towards minimizing undesirable downstream consequences. Language models as personal dialogue assistants, by virtue of engaging in conversation with the user, have the ability to influence, persuade or potentially manipulate the user in adversarial settings. Franklin et al. [2022] argue for a framework to address the lack of formalism in the study of user preference and behavioral change due to these models. While adversarial change in user preferences has been studied for recommender systems Adomavicius et al. [2013], it has largely been unexplored from the lens of dialogue assistants and large language models.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found