Simulation of Human Behavior
How Do Artificial Intelligences Think? The Three Mathematico-Cognitive Factors of Categorical Segmentation Operated by Synthetic Neurons
Pichat, Michael, Pogrund, William, Gasparian, Armanush, Pichat, Paloma, Demarchi, Samuel, Veillet-Guillem, Michael
How do the synthetic neurons in language models create "thought categories" to segment and analyze their informational environment? What are the cognitive characteristics, at the very level of formal neurons, of this artificial categorical thought? Based on the mathematical nature of algebraic operations inherent to neuronal aggregation functions, we attempt to identify mathematico-cognitive factors that genetically shape the categorical reconstruction of the informational world faced by artificial cognition. This study explores these concepts through the notions of priming, attention, and categorical phasing.
The Impact of Token Granularity on the Predictive Power of Language Model Surprisal
Oh, Byung-Doh, Schuler, William
Word-by-word language model surprisal is often used to model the incremental processing of human readers, which raises questions about how various choices in language modeling influence its predictive power. One factor that has been overlooked in cognitive modeling is the granularity of subword tokens, which explicitly encodes information about word length and frequency, and ultimately influences the quality of vector representations that are learned. This paper presents experiments that manipulate the token granularity and evaluate its impact on the ability of surprisal to account for processing difficulty of naturalistic text and garden-path constructions. Experiments with naturalistic reading times reveal a substantial influence of token granularity on surprisal, with tokens defined by a vocabulary size of 8,000 resulting in surprisal that is most predictive. In contrast, on garden-path constructions, language models trained on coarser-grained tokens generally assigned higher surprisal to critical regions, suggesting their increased sensitivity to syntax. Taken together, these results suggest a large role of token granularity on the quality of language model surprisal for cognitive modeling.
NeuroAI for AI Safety
Mineault, Patrick, Zanichelli, Niccolò, Peng, Joanne Zichen, Arkhipov, Anton, Bingham, Eli, Jara-Ettinger, Julian, Mackevicius, Emily, Marblestone, Adam, Mattar, Marcelo, Payne, Andrew, Sanborn, Sophia, Schroeder, Karen, Tavares, Zenna, Tolias, Andreas
As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.
Finding Structure in Language Models
When we speak, write or listen, we continuously make predictions based on our knowledge of a language's grammar. Remarkably, children acquire this grammatical knowledge within just a few years, enabling them to understand and generalise to novel constructions that have never been uttered before. Language models are powerful tools that create representations of language by incrementally predicting the next word in a sentence, and they have had a tremendous societal impact in recent years. The central research question of this thesis is whether these models possess a deep understanding of grammatical structure similar to that of humans. This question lies at the intersection of natural language processing, linguistics, and interpretability. To address it, we will develop novel interpretability techniques that enhance our understanding of the complex nature of large-scale language models. We approach our research question from three directions. First, we explore the presence of abstract linguistic information through structural priming, a key paradigm in psycholinguistics for uncovering grammatical structure in human language processing. Next, we examine various linguistic phenomena, such as adjective order and negative polarity items, and connect a model's comprehension of these phenomena to the data distribution on which it was trained. Finally, we introduce a controlled testbed for studying hierarchical structure in language models using various synthetic languages of increasing complexity and examine the role of feature interactions in modelling this structure. Our findings offer a detailed account of the grammatical knowledge embedded in language model representations and provide several directions for investigating fundamental linguistic questions using computational methods.
Learning to Cooperate with Humans using Generative Agents
Liang, Yancheng, Chen, Daphne, Gupta, Abhishek, Du, Simon S., Jaques, Natasha
Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL). Current algorithms focus on training simulated human partner policies which are then used to train a Cooperator agent. The simulated human is produced either through behavior cloning over a dataset of human cooperation behavior, or by using MARL to create a population of simulated agents. However, these approaches often struggle to produce a Cooperator that can coordinate well with real humans, since the simulated humans fail to cover the diverse strategies and styles employed by people in the real world. We show \emph{learning a generative model of human partners} can effectively address this issue. Our model learns a latent variable representation of the human that can be regarded as encoding the human's unique strategy, intention, experience, or style. This generative model can be flexibly trained from any (human or neural policy) agent interaction data. By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents. We evaluate our method -- \textbf{G}enerative \textbf{A}gent \textbf{M}odeling for \textbf{M}ulti-agent \textbf{A}daptation (GAMMA) -- on Overcooked, a challenging cooperative cooking game that has become a standard benchmark for zero-shot coordination. We conduct an evaluation with real human teammates, and the results show that GAMMA consistently improves performance, whether the generative model is trained on simulated populations or human datasets. Further, we propose a method for posterior sampling from the generative model that is biased towards the human data, enabling us to efficiently improve performance with only a small amount of expensive human interaction data.
Centaur: a foundation model of human cognition
Binz, Marcel, Akata, Elif, Bethge, Matthias, Brändle, Franziska, Callaway, Fred, Coda-Forno, Julian, Dayan, Peter, Demircan, Can, Eckstein, Maria K., Éltető, Noémi, Griffiths, Thomas L., Haridi, Susanne, Jagadish, Akshay K., Ji-An, Li, Kipnis, Alexander, Kumar, Sreejan, Ludwig, Tobias, Mathony, Marvin, Mattar, Marcelo, Modirshanechi, Alireza, Nath, Surabhi S., Peterson, Joshua C., Rmus, Milena, Russek, Evan M., Saanum, Tankred, Scharfenberg, Natalia, Schubert, Johannes A., Buschoff, Luca M. Schulze, Singhi, Nishad, Sui, Xin, Thalmann, Mirko, Theis, Fabian, Truong, Vuong, Udandarao, Vishaal, Voudouris, Konstantinos, Wilson, Robert, Witte, Kristin, Wu, Shuchen, Wulff, Dirk, Xiong, Huadong, Schulz, Eric
Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, Centaur is the first real candidate for a unified model of human cognition. We anticipate that it will have a disruptive impact on the cognitive sciences, challenging the existing paradigm for developing computational models.
Explainers' Mental Representations of Explainees' Needs in Everyday Explanations
Schaffer, Michael Erol, Terfloth, Lutz, Schulte, Carsten, Buhl, Heike M.
In explanations, explainers have mental representations of explainees' developing knowledge and shifting interests regarding the explanandum. These mental representations are dynamic in nature and develop over time, thereby enabling explainers to react to explainees' needs by adapting and customizing the explanation. XAI should be able to react to explainees' needs in a similar manner. Therefore, a component that incorporates aspects of explainers' mental representations of explainees is required. In this study, we took first steps by investigating explainers' mental representations in everyday explanations of technological artifacts. According to the dual nature theory, technological artifacts require explanations with two distinct perspectives, namely observable and measurable features addressing "Architecture" or interpretable aspects addressing "Relevance". We conducted extended semi structured pre-, post- and video recall-interviews with explainers (N=9) in the context of an explanation. The transcribed interviews were analyzed utilizing qualitative content analysis. The explainers' answers regarding the explainees' knowledge and interests with regard to the technological artifact emphasized the vagueness of early assumptions of explainers toward strong beliefs in the course of explanations. The assumed knowledge of explainees in the beginning is centered around Architecture and develops toward knowledge with regard to both Architecture and Relevance. In contrast, explainers assumed higher interests in Relevance in the beginning to interests regarding both Architecture and Relevance in the further course of explanations. Further, explainers often finished the explanation despite their perception that explainees still had gaps in knowledge. These findings are transferred into practical implications relevant for user models for adaptive explainable systems.
Incorporating Human Explanations for Robust Hate Speech Detection
Chen, Jennifer L., Ladhak, Faisal, Li, Daniel, Elhadad, Noémie
Given the black-box nature and complexity of large transformer language models (LM), concerns about generalizability and robustness present ethical implications for domains such as hate speech (HS) detection. Using the content rich Social Bias Frames dataset, containing human-annotated stereotypes, intent, and targeted groups, we develop a three stage analysis to evaluate if LMs faithfully assess hate speech. First, we observe the need for modeling contextually grounded stereotype intents to capture implicit semantic meaning. Next, we design a new task, Stereotype Intent Entailment (SIE), which encourages a model to contextually understand stereotype presence. Finally, through ablation tests and user studies, we find a SIE objective improves content understanding, but challenges remain in modeling implicit intent.
Can Robotic Cues Manipulate Human Decisions? Exploring Consensus Building via Bias-Controlled Non-linear Opinion Dynamics and Robotic Eye Gaze Mediated Interaction in Human-Robot Teaming
Kumar, Rajul, Bhatti, Adam, Yao, Ningshi
Although robots are becoming more advanced with human-like anthropomorphic features and decision-making abilities to improve collaboration, the active integration of humans into this process remains under-explored. This article presents the first experimental study exploring decision-making interactions between humans and robots with visual cues from robotic eyes, which can dynamically influence human opinion formation. The cues generated by robotic eyes gradually guide human decisions towards alignment with the robot's choices. Both human and robot decision-making processes are modeled as non-linear opinion dynamics with evolving biases. To examine these opinion dynamics under varying biases, we conduct numerical parametric and equilibrium continuation analyses using tuned parameters designed explicitly for the presented human-robot interaction experiment. Furthermore, to facilitate the transition from disagreement to agreement, we introduced a human opinion observation algorithm integrated with the formation of the robot's opinion, where the robot's behavior is controlled based on its formed opinion. The algorithms developed aim to enhance human involvement in consensus building, fostering effective collaboration between humans and robots. Experiments with 51 participants (N = 51) show that human-robot teamwork can be improved by guiding human decisions using robotic cues. Finally, we provide detailed insights on the effects of trust, cognitive load, and participant demographics on decision-making based on user feedback and post-experiment interviews.
The Double-Edged Sword of Behavioral Responses in Strategic Classification: Theory and User Studies
Ebrahimi, Raman, Vaccaro, Kristen, Naghizadeh, Parinaz
As machine learning systems become more widely deployed, including in settings such as resume screening, hiring, lending, and recommendation systems, people have begun to respond to them strategically. Often, this takes the form of "gaming the system" or using an algorithmic system's rules and procedures to manipulate it and achieve desired outcomes. Examples include Uber drivers coordinating the times they log on and off the app to impact its surge pricing algorithm (Möhlmann and Zalmanson, 2017), and Twitter (Burrell et al., 2019) and Facebook (Eslami et al., 2016) users' decisions regarding how to interact with content given the platforms' curation algorithms. Game theoretical modeling and analysis have been used in recent years to formally analyze such strategic responses of humans to algorithms (e.g., Hardt et al. (2016); Milli et al. (2019); Liu et al. (2020); see also Related Work). However, these existing works assume standard models of decision making, where agents are fully rational when responding to algorithms; yet, humans exhibit different forms of cognitive biases in decision making (Kahnemann and Tversky, 1979). Motivated by this, we explore the impacts behavioral biases on agents' strategic responses to algorithms. We begin by proposing an extension of existing models of strategic classification to account for behavioral biases.