social convention
Emergent LLM behaviors are observationally equivalent to data leakage
Barrie, Christopher, Törnberg, Petter
Global convergence: Rapid convergence to a single, repeated action (a convention), maximizing joint and individual payoffs. Put simply, while the model does not explicitly identify this as a "naming game" setup, it does understand the basic structure of the scenario as well as optimal moves after success and what global convergence will look like. We conducted this analysis across a range of different LLMs. We then also used the OpenAI model gpt-4.1 to annotate three dimensions of the different LLM model outputs: whether it identified the setup as a coordination game; whether it correctly identified the optimal move; and whether it was able to correctly predict how the scenario would converge globally. We also asked the model to output the text snippet from the model output of the given LLM that the OpenAI model used to justify its decision.
The Dynamics of Social Conventions in LLM populations: Spontaneous Emergence, Collective Biases and Tipping Points
Ashery, Ariel Flint, Aiello, Luca Maria, Baronchelli, Andrea
Social conventions are the foundation for social and economic life. As legions of AI agents increasingly interact with each other and with humans, their ability to form shared conventions will determine how effectively they will coordinate behaviors, integrate into society and influence it. Here, we investigate the dynamics of conventions within populations of Large Language Model (LLM) agents using simulated interactions. First, we show that globally accepted social conventions can spontaneously arise from local interactions between communicating LLMs. Second, we demonstrate how strong collective biases can emerge during this process, even when individual agents appear to be unbiased. Third, we examine how minority groups of committed LLMs can drive social change by establishing new social conventions. We show that once these minority groups reach a critical size, they can consistently overturn established behaviors. In all cases, contrasting the experimental results with predictions from a minimal multi-agent model allows us to isolate the specific role of LLM agents. Our results clarify how AI systems can autonomously develop norms without explicit programming and have implications for designing AI systems that align with human values and societal goals.
Adversarially Guided Self-Play for Adopting Social Conventions
Tucker, Mycal, Zhou, Yilun, Shah, Julie
Robotic agents must adopt existing social conventions in order to be effective teammates. These social conventions, such as driving on the right or left side of the road, are arbitrary choices among optimal policies, but all agents on a successful team must use the same convention. Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them. We build upon this work by introducing a technique called Adversarial Self-Play (ASP) that uses adversarial training to shape the space of possible learned policies and substantially improves learning efficiency. ASP only requires the addition of unpaired data: a dataset of outputs produced by the social convention without associated inputs. Theoretical analysis reveals how ASP shapes the policy space and the circumstances (when behaviors are clustered or exhibit some other structure) under which it offers the greatest benefits. Empirical results across three domains confirm ASP's advantages: it produces models that more closely match the desired social convention when given as few as two paired datapoints.
Modeling Theory of Mind in Multi-Agent Games Using Adaptive Feedback Control
Freire, Ismael T., Arsiwalla, Xerxes D., Puigbò, Jordi-Ysard, Verschure, Paul
A major challenge in cognitive science and AI has been to understand how autonomous agents might acquire and predict behavioral and mental states of other agents in the course of complex social interactions. How does such an agent model the goals, beliefs, and actions of other agents it interacts with? What are the computational principles to model a Theory of Mind (ToM)? Deep learning approaches to address these questions fall short of a better understanding of the problem. In part, this is due to the black-box nature of deep networks, wherein computational mechanisms of ToM are not readily revealed. Here, we consider alternative hypotheses seeking to model how the brain might realize a ToM. In particular, we propose embodied and situated agent models based on distributed adaptive control theory to predict actions of other agents in five different game theoretic tasks (Harmony Game, Hawk-Dove, Stag-Hunt, Prisoner's Dilemma and Battle of the Exes). Our multi-layer control models implement top-down predictions from adaptive to reactive layers of control and bottom-up error feedback from reactive to adaptive layers. We test cooperative and competitive strategies among seven different agent models (cooperative, greedy, tit-for-tat, reinforcement-based, rational, predictive and other's-model agents). We show that, compared to pure reinforcement-based strategies, probabilistic learning agents modeled on rational, predictive and other's-model phenotypes perform better in game-theoretic metrics across tasks. Our autonomous multi-agent models capture systems-level processes underlying a ToM and highlight architectural principles of ToM from a control-theoretic perspective.
Learning Social Conventions in Markov Games
Lerer, Adam, Peysakhovich, Alexander
Social conventions - arbitrary ways to organize group behavior - are an important part of social life. Any agent that wants to enter an existing society must be able to learn its conventions (e.g. which side of the road to drive on, which language to speak) from relatively few observations or risk being unable to coordinate with everyone else. We consider the game theoretic framework of David Lewis which views the selection of a social convention as the selection of an equilibrium in a coordination game. We ask how to construct reinforcement learning based agents that can solve the convention learning task in the self-play paradigm: at training time the agent has access to a good model of the environment and a small amount of observations about how individuals in society act. The agent then has to construct a policy that is compatible with the test-time social convention. We study three environments from the literature which have multiple conventions: traffic, communication, and risky coordination. In each of these we observe that adding a small amount of imitation learning during self-play training greatly increases the probability that the strategy found by self-play fits well with the social convention the agent will face at test time. We show that this works even in an environment where standard independent multi-agent RL very rarely finds the correct test-time equilibrium.
Modeling the Formation of Social Conventions in Multi-Agent Populations
Freire, Ismael T., Moulin-Frier, Clement, Sanchez-Fibla, Marti, Arsiwalla, Xerxes D., Verschure, Paul
In order to understand the formation of social conventions we need to know the specific role of control and learning in multi-agent systems. To advance in this direction, we propose, within the framework of the Distributed Adaptive Control (DAC) theory, a novel Control-based Reinforcement Learning architecture (CRL) that can account for the acquisition of social conventions in multi-agent populations that are solving a benchmark social decision-making problem. Our new CRL architecture, as a concrete realization of DAC multi-agent theory, implements a low-level sensorimotor control loop handling the agent's reactive behaviors (pre-wired reflexes), along with a layer based on model-free reinforcement learning that maximizes long-term reward. We apply CRL in a multi-agent game-theoretic task in which coordination must be achieved in order to find an optimal solution. We show that our CRL architecture is able to both find optimal solutions in discrete and continuous time and reproduce human experimental data on standard game-theoretic metrics such as efficiency in acquiring rewards, fairness in reward distribution and stability of convention formation.
Artificial Intelligence: Navy Works on Teaching Robots How to Behave
The rise of artificial intelligence has long stoked fears of killer robots like the "Terminator," and early versions of military automatons are already in the battlefield. Now the Navy is looking into how it can teach machines to do the right thing. "We've been looking at different ways that we can have people interact with autonomous systems," Marc Steinberg, an Office of Naval Research manager, said in a phone interview this month. The Navy is funding a slew of projects at universities and institutes that look at how to train such systems, including stopping robots from harming people. In 1979, a Ford autoworker in Michigan became the first person killed by a robot when he was struck in the head by the arm of a 1-ton production-line machine, according to Guinness World Records. More recently, police in Dallas used a robot to deliver a bomb that killed the shooter who opened fire on officers at a Black Lives Matter protest.
Stanford University's Jackrabbot can navigate tricky pedestrians to make local deliveries
Elbowing your way through crowds can be slow going, but our ability to weave and dodge through a throng of people comes almost as second nature. For robots, however, this simple task can prove a major obstacle that currently limits their usefulness in public places. But now, a team from Stanford University says it has managed to create droid which is able to navigate down streets without mowing down people walking in the opposite direction, which make them better at making deliveries. The Jackrabbot is a robot designed by a team from Stanford University. It takes its name from the nimble yet shy Jackrabbit, which is often found on the university's campus.
Robots could learn human values by reading stories, research suggests
More than 70 years ago, Isaac Asimov dreamed up his three laws of robotics, which insisted, above all, that "a robot may not injure a human being or, through inaction, allow a human being to come to harm". Now, after Stephen Hawking warned that "the development of full artificial intelligence could spell the end of the human race", two academics have come up with a way of teaching ethics to computers: telling them stories. Mark Riedl and Brent Harrison from the School of Interactive Computing at the Georgia Institute of Technology have just unveiled Quixote, a prototype system that is able to learn social conventions from simple stories. Or, as they put in their paper Using Stories to Teach Human Values to Artificial Agents, revealed at the AAAI-16 Conference in Phoenix, Arizona this week, the stories are used "to generate a value-aligned reward signal for reinforcement learning agents that prevents psychotic-appearing behaviour". A simple version of a story could be about going to get prescription medicine from a chemist, laying out what a human would typically do and encounter in this situation.
The Emergence of Conventions in Online Social Networks
Kooti, Farshad (Max Planck Institute for Software Systems) | Yang, Haeryun (KAIST) | Cha, Meeyoung (KAIST) | Gummadi, Krishna P. (MPI-SWS) | Mason, Winter A. (Stevens Institute of Technology)
The way in which social conventions emerge in communities has been of interest to social scientists for decades. Here we report on the emergence of a particular social convention on Twitter—the way to indicate a tweet is being reposted and to attribute the content to its source. Initially, different variations were invented and spread through the Twitter network. The inventors and early adopters were well-connected, active, core members of the Twitter community. The diffusion networks of these conventions were dense and highly clustered, so no single user was critical to the adoption of the conventions. Despite being invented at different times and having different adoption rates, only two variations came to be widely adopted. In this paper we describe this process in detail, highlighting insights and raising questions about how social conventions emerge.