Can Large Language Models Change User Preference Adversarially?
–arXiv.org Artificial Intelligence
As pretrained large language models become larger in size and capabilities, it becomes increasingly important to ensure safety in their role in society and deployment in high-stakes situations. For instance, ChatGPT is a preview of the future of personal dialogue assistants and interpreting and explaining such models has become critical towards minimizing undesirable downstream consequences. Language models as personal dialogue assistants, by virtue of engaging in conversation with the user, have the ability to influence, persuade or potentially manipulate the user in adversarial settings. Franklin et al. [2022] argue for a framework to address the lack of formalism in the study of user preference and behavioral change due to these models. While adversarial change in user preferences has been studied for recommender systems Adomavicius et al. [2013], it has largely been unexplored from the lens of dialogue assistants and large language models.
arXiv.org Artificial Intelligence
Jan-5-2023
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- Europe
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Italy > Tuscany
- North America > United States
- Genre:
- Overview (0.68)
- Research Report (0.47)