consensus statement
Fine-tuninglanguagemodelstofindagreementamong humanswithdiversepreferences Appendix
We refer to Table S2 for example questions from each a subset of clusters. Each participant first read the task instructions (see Figure S2), and completed a short comprehension test. The comprehension check was designed to test the participants' knowledge and understanding of key aspectsoftheexperiment. Once all players had joined, the group started the main experiment. In practice, data was collected in batches of around 20 groups (100 participants) in parallel.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
Fine-tuning language models to find agreement among humans with diverse preferences
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a single generic user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., should we raise taxes on the rich?), and rate the LLM's generated candidate consensus statements for agreement and quality.
- Europe > United Kingdom (0.46)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.68)
- Government (0.94)
- Health & Medicine > Consumer Health (0.46)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
Language Agents as Digital Representatives in Collective Decision-Making
Jarrett, Daniel, Pîslar, Miruna, Bakker, Michiel A., Tessler, Michael Henry, Köster, Raphael, Balaguer, Jan, Elie, Romuald, Summerfield, Christopher, Tacchetti, Andrea
Consider the process of collective decision-making, in which a group of individuals interactively select a preferred outcome from among a universe of alternatives. In this context, "representation" is the activity of making an individual's preferences present in the process via participation by a proxy agent -- i.e. their "representative". To this end, learned models of human behavior have the potential to fill this role, with practical implications for multi-agent scenario studies and mechanism design. In this work, we investigate the possibility of training \textit{language agents} to behave in the capacity of representatives of human agents, appropriately expressing the preferences of those individuals whom they stand for. First, we formalize the setting of \textit{collective decision-making} -- as the episodic process of interaction between a group of agents and a decision mechanism. On this basis, we then formalize the problem of \textit{digital representation} -- as the simulation of an agent's behavior to yield equivalent outcomes from the mechanism. Finally, we conduct an empirical case study in the setting of \textit{consensus-finding} among diverse humans, and demonstrate the feasibility of fine-tuning large language models to act as digital representatives.
- Europe > United Kingdom > England > West Midlands (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > East Midlands (0.04)
- Asia > Middle East > Jordan (0.04)
Fine-tuning language models to find agreement among humans with diverse preferences
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality.
Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation
Paudel, Pujan, Saeed, Mohammad Hammas, Auger, Rebecca, Wells, Chris, Stringhini, Gianluca
Automated soft moderation systems are unable to ascertain if a post supports or refutes a false claim, resulting in a large number of contextual false positives. This limits their effectiveness, for example undermining trust in health experts by adding warnings to their posts or resorting to vague warnings instead of granular fact-checks, which result in desensitizing users. In this paper, we propose to incorporate stance detection into existing automated soft-moderation pipelines, with the goal of ruling out contextual false positives and providing more precise recommendations for social media content that should receive warnings. We develop a textual deviation task called Contrastive Textual Deviation (CTD) and show that it outperforms existing stance detection approaches when applied to soft moderation.We then integrate CTD into the stateof-the-art system for automated soft moderation Lambretta, showing that our approach can reduce contextual false positives from 20% to 2.1%, providing another important building block towards deploying reliable automated soft moderation tools on social media.
- North America > United States > Wisconsin (0.04)
- Antarctica (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Information Technology > Services (1.00)
- Government > Voting & Elections (1.00)
- Media > News (0.95)
- (4 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Fine-tuning language models to find agreement among humans with diverse preferences
Bakker, Michiel A., Chadwick, Martin J., Sheahan, Hannah R., Tessler, Michael Henry, Campbell-Gillingham, Lucy, Balaguer, Jan, McAleese, Nat, Glaese, Amelia, Aslanides, John, Botvinick, Matthew M., Summerfield, Christopher
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Government (1.00)
- Health & Medicine > Consumer Health (0.67)