Stray, Jonathan
AI and the Future of Digital Public Squares
Goldberg, Beth, Acosta-Navas, Diana, Bakker, Michiel, Beacock, Ian, Botvinick, Matt, Buch, Prateek, DiResta, Renée, Donthi, Nandika, Fast, Nathanael, Iyer, Ravi, Jalan, Zaria, Konya, Andrew, Danciu, Grace Kwak, Landemore, Hélène, Marwick, Alice, Miller, Carl, Ovadya, Aviv, Saltz, Emily, Schirch, Lisa, Shalom, Dalit, Siddarth, Divya, Sieker, Felix, Small, Christopher, Stray, Jonathan, Tang, Audrey, Tessler, Michael Henry, Zhang, Amy
Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerbating societal schisms. Here, we explore four applications of LLMs to improve digital public squares: collective dialogue systems, bridging systems, community moderation, and proof-of-humanity systems. Building on the input from over 70 civil society experts and technologists, we argue that LLMs both afford promising opportunities to shift the paradigm for conversations at scale and pose distinct risks for digital public squares. We lay out an agenda for future research and investments in AI that will strengthen digital public squares and safeguard against potential misuses of AI.
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
Inie, Nanna, Stray, Jonathan, Derczynski, Leon
Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We relate and connect this activity between its practitioners' motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays. As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.