Goto

Collaborating Authors

 newcomb




A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Oesterheld, Caspar, Cooper, Emery, Kodama, Miles, Nguyen, Linh Chi, Perez, Ethan

arXiv.org Artificial Intelligence

We introduce a dataset of natural-language questions in the decision theory of so-called Newcomb-like problems. Newcomb-like problems include, for instance, decision problems in which an agent interacts with a similar other agent, and thus has to reason about the fact that the other agent will likely reason in similar ways. Evaluating LLM reasoning about Newcomb-like problems is important because interactions between foundation-model-based agents will often be Newcomb-like. Some ways of reasoning about Newcomb-like problems may allow for greater cooperation between models. Our dataset contains both capabilities questions (i.e., questions with a unique, uncontroversially correct answer) and attitude questions (i.e., questions about which decision theorists would disagree). We use our dataset for an investigation of decision-theoretical capabilities and expressed attitudes and their interplay in existing models (different models by OpenAI, Anthropic, Meta, GDM, Reka, etc.), as well as models under simple prompt-based interventions. We find, among other things, that attitudes vary significantly between existing models; that high capabilities are associated with attitudes more favorable toward so-called evidential decision theory; and that attitudes are consistent across different types of questions.


A's pitcher records win without facing batter in statistical anomaly

FOX News

Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. Oakland Athletics reliever Sean Newcomb recorded his first win of the year on Friday night with zero batters faced. So, how did he do it? With the A's tied at 5 against the Minnesota Twins with two outs in the eighth inning and a man on first, Newcomb entered the game from the bullpen.


Characterising Decision Theories with Mechanised Causal Graphs

MacDermott, Matt, Everitt, Tom, Belardinelli, Francesco

arXiv.org Artificial Intelligence

How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, with contenders for how to think about it including evidential decision theory, causal decision theory, and functional decision theory. In this paper, we show that mechanised causal models can be used to characterise and differentiate the most important decision theories, and generate a taxonomy of different decision theories.


Metatickles and Death in Damascus

Khan, Saira

arXiv.org Artificial Intelligence

The prescriptions of our two most prominent strands of decision theory, evidential and causal, differ in a general class of problems known as Newcomb problems. In these, evidential decision theory prescribes choosing a dominated act. Attempts have been made at reconciling the two theories by relying on additional requirements such as ratification (Jeffrey 1983) or "tickles" (Eells 1982). It has been argued that such attempts have failed (Lewis 1981a; Skyrms 1982). More recently, Huttegger (forthcoming) has developed a version of deliberative decision theory that reconciles the prescriptions of the evidentialist and causalist. In this paper, I extend this framework to problems characterised by decision instability, and show that it cannot deliver a resolute answer under a plausible specification of the tickle. I prove that there exists a robust method of determining whether the specification of the tickle matters for all two-state, two-act problems whose payoff tables exhibit some basic mathematical relationships. One upshot is that we have a principled way of knowing ex-ante whether a reconciliation of evidential and causal decision theory is plausible for a wide range of decision problems under this framework. Another upshot is that the tickle approach needs further work to achieve full reconciliation.


The Most Terrifying Thought Experiment of All Time

#artificialintelligence

WARNING: Reading this article may commit you to an eternity of suffering and torment. These are some of the urban legends spawned by the Internet. Yet none is as all-powerful and threatening as Roko's Basilisk. For Roko's Basilisk is an evil, godlike form of artificial intelligence, so dangerous that if you see it, or even think about it too hard, you will spend the rest of eternity screaming in its torture chamber. Even death is no escape, for if you die, Roko's Basilisk will resurrect you and begin the torture again.


Extending Environments To Measure Self-Reflection In Reinforcement Learning

Alexander, Samuel Allen, Castaneda, Michael, Compher, Kevin, Martinez, Oscar

arXiv.org Artificial Intelligence

We consider an extended notion of reinforcement learning in which the environment can simulate the agent and base its outputs on the agent's hypothetical behavior. Since good performance usually requires paying attention to whatever things the environment's outputs are based on, we argue that for an agent to achieve on-average good performance across many such extended environments, it is necessary for the agent to self-reflect. Thus, an agent's self-reflection ability can be numerically estimated by running the agent through a battery of extended environments. We are simultaneously releasing an open-source library of extended environments to serve as proof-of-concept of this technique. As the library is first-of-kind, we have avoided the difficult problem of optimizing it. Instead we have chosen environments with interesting properties. Some seem paradoxical, some lead to interesting thought experiments, some are even suggestive of how self-reflection might have evolved in nature. We give examples and introduce a simple transformation which experimentally seems to increase self-reflection.


Purely Bayesian counterfactuals versus Newcomb's paradox

Hoang, Lê Nguyên

arXiv.org Artificial Intelligence

This paper proposes a careful separation between an entity's epistemic system and their decision system. Crucially, Bayesian counterfactuals are estimated by the epistemic system; not by the decision system. Based on this remark, I prove the existence of Newcomb-like problems for which an epistemic system necessarily expects the entity to make a counterfactually bad decision. I then address (a slight generalization of) Newcomb's paradox. I solve the specific case where the player believes that the predictor applies Bayes rule with a supset of all the data available to the player. I prove that the counterfactual optimality of the 1-Box strategy depends on the player's prior on the predictor's additional data. If these additional data are not expected to reduce sufficiently the predictor's uncertainty on the player's decision, then the player's epistemic system will counterfactually prefer to 2-Box. But if the predictor's data is believed to make them quasi-omniscient, then 1-Box will be counterfactually preferred. Implications of the analysis are then discussed. More generally, I argue that, to better understand or design an entity, it is useful to clearly separate the entity's epistemic, decision, but also data collection, reward and maintenance systems, whether the entity is human, algorithmic or institutional.


Functional Decision Theory in an Evolutionary Environment

Topper, Noah

arXiv.org Artificial Intelligence

Functional decision theory (FDT) is a fairly new mode of decision theory and a normative viewpoint on how an agent should maximize expected utility. The current standard in decision theory and computer science is causal decision theory (CDT), largely seen as superior to the main alternative evidential decision theory (EDT). These theories prescribe three distinct methods for maximizing utility. We explore how FDT differs from CDT and EDT, and what implications it has on the behavior of FDT agents and humans. It has been shown in previous research how FDT can outperform CDT and EDT. We additionally show FDT performing well on more classical game theory problems and argue for its extension to human problems to show that its potential for superiority is robust. We also make FDT more concrete by displaying it in an evolutionary environment, competing directly against other theories.