Goto

Collaborating Authors

 consensus statement


Fine-tuninglanguagemodelstofindagreementamong humanswithdiversepreferences Appendix

Neural Information Processing Systems

We refer to Table S2 for example questions from each a subset of clusters. Each participant first read the task instructions (see Figure S2), and completed a short comprehension test. The comprehension check was designed to test the participants' knowledge and understanding of key aspectsoftheexperiment. Once all players had joined, the group started the main experiment. In practice, data was collected in batches of around 20 groups (100 participants) in parallel.



Fine-tuning language models to find agreement among humans with diverse preferences

Neural Information Processing Systems

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a single generic user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., should we raise taxes on the rich?), and rate the LLM's generated candidate consensus statements for agreement and quality.




Language Agents as Digital Representatives in Collective Decision-Making

Jarrett, Daniel, Pîslar, Miruna, Bakker, Michiel A., Tessler, Michael Henry, Köster, Raphael, Balaguer, Jan, Elie, Romuald, Summerfield, Christopher, Tacchetti, Andrea

arXiv.org Artificial Intelligence

Consider the process of collective decision-making, in which a group of individuals interactively select a preferred outcome from among a universe of alternatives. In this context, "representation" is the activity of making an individual's preferences present in the process via participation by a proxy agent -- i.e. their "representative". To this end, learned models of human behavior have the potential to fill this role, with practical implications for multi-agent scenario studies and mechanism design. In this work, we investigate the possibility of training \textit{language agents} to behave in the capacity of representatives of human agents, appropriately expressing the preferences of those individuals whom they stand for. First, we formalize the setting of \textit{collective decision-making} -- as the episodic process of interaction between a group of agents and a decision mechanism. On this basis, we then formalize the problem of \textit{digital representation} -- as the simulation of an agent's behavior to yield equivalent outcomes from the mechanism. Finally, we conduct an empirical case study in the setting of \textit{consensus-finding} among diverse humans, and demonstrate the feasibility of fine-tuning large language models to act as digital representatives.


Fine-tuning language models to find agreement among humans with diverse preferences

Neural Information Processing Systems

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality.


Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

Paudel, Pujan, Saeed, Mohammad Hammas, Auger, Rebecca, Wells, Chris, Stringhini, Gianluca

arXiv.org Artificial Intelligence

Automated soft moderation systems are unable to ascertain if a post supports or refutes a false claim, resulting in a large number of contextual false positives. This limits their effectiveness, for example undermining trust in health experts by adding warnings to their posts or resorting to vague warnings instead of granular fact-checks, which result in desensitizing users. In this paper, we propose to incorporate stance detection into existing automated soft-moderation pipelines, with the goal of ruling out contextual false positives and providing more precise recommendations for social media content that should receive warnings. We develop a textual deviation task called Contrastive Textual Deviation (CTD) and show that it outperforms existing stance detection approaches when applied to soft moderation.We then integrate CTD into the stateof-the-art system for automated soft moderation Lambretta, showing that our approach can reduce contextual false positives from 20% to 2.1%, providing another important building block towards deploying reliable automated soft moderation tools on social media.


Fine-tuning language models to find agreement among humans with diverse preferences

Bakker, Michiel A., Chadwick, Martin J., Sheahan, Hannah R., Tessler, Michael Henry, Campbell-Gillingham, Lucy, Balaguer, Jan, McAleese, Nat, Glaese, Amelia, Aslanides, John, Botvinick, Matthew M., Summerfield, Christopher

arXiv.org Artificial Intelligence

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.