othello-gpt
Does ChatGPT Have a Mind?
Goldstein, Simon, Levinstein, Benjamin A.
This paper examines the question of whether Large Language Models (LLMs) like ChatGPT possess minds, focusing specifically on whether they have a genuine folk psychology encompassing beliefs, desires, and intentions. We approach this question by investigating two key aspects: internal representations and dispositions to act. First, we survey various philosophical theories of representation, including informational, causal, structural, and teleosemantic accounts, arguing that LLMs satisfy key conditions proposed by each. We draw on recent interpretability research in machine learning to support these claims. Second, we explore whether LLMs exhibit robust dispositions to perform actions, a necessary component of folk psychology. We consider two prominent philosophical traditions, interpretationism and representationalism, to assess LLM action dispositions. While we find evidence suggesting LLMs may satisfy some criteria for having a mind, particularly in game-theoretic environments, we conclude that the data remains inconclusive. Additionally, we reply to several skeptical challenges to LLM folk psychology, including issues of sensory grounding, the "stochastic parrots" argument, and concerns about memorization. Our paper has three main upshots. First, LLMs do have robust internal representations. Second, there is an open question to answer about whether LLMs have robust action dispositions. Third, existing skeptical challenges to LLM representation do not survive philosophical scrutiny.
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT
Hazineh, Dean S., Zhang, Zechen, Chiu, Jeffery
Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public.
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Li, Kenneth, Hopkins, Aspen K., Bau, David, Viégas, Fernanda, Pfister, Hanspeter, Wattenberg, Martin
Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question in a synthetic setting by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network. By leveraging these intervention techniques, we produce "latent saliency maps" that help explain predictions. Recent language models have shown an intriguing range of capabilities. Networks trained on a simple "next-word" prediction task are apparently capable of many other things, such as solving logic puzzles or writing basic code. Yet how this type of performance emerges from sequence predictions remains a subject of current debate. Some have suggested that training on a sequence modeling task is inherently limiting. The arguments range from philosophical (Bender & Koller, 2020) to mathematical (Merrill et al., 2021). A common theme is that seemingly good performance might result from memorizing "surface statistics," i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence. This issue is of practical concern, since relying on spurious correlations may lead to problems on out-of-distribution data (Bender et al., 2021; Floridi & Chiriatti, 2020). On the other hand, some tantalizing clues suggest language models may do more than collect spurious correlations, instead building interpretable world models--that is, understandable models of the process producing the sequences they are trained on.
Stochastic parrot or world model? How large language models learn
Large language models show impressive capabilities. Are they just superficial statistics – or is there more to them? Systems such as OpenAI's GPT-3 have shown that large language models have capabilities that can make them useful tools in areas as diverse as text processing and programming. With ChatGPT the company has released a model that puts these capabilities in the hands of the general public, creating new challenges for educational institutions, for example. Impressive capabilities quickly lead to the overestimation of AI systems like ChatGPT.
Large Language Model: world models or surface statistics?
Large Language Models (LLM) are on fire, capturing public attention by their ability to provide seemingly impressive completions to user prompts (NYT coverage). They are a delicate combination of a radically simplistic algorithm with massive amounts of data and computing power. They are trained by playing a guess-the-next-word game with itself over and over again. Each time, the model looks at a partial sentence and guesses the following word. If it makes it correctly, it will update its parameters to reinforce its confidence; otherwise, it will learn from the error and give a better guess next time.