Conitzer, Vincent
Why should we ever automate moral decision making?
Conitzer, Vincent
While people generally trust AI to make decisions in various aspects of their lives, concerns arise when AI is involved in decisions with significant moral implications. The absence of a precise mathematical framework for moral reasoning intensifies these concerns, as ethics often defies simplistic mathematical models. Unlike fields such as logical reasoning, reasoning under uncertainty, and strategic decision-making, which have well-defined mathematical frameworks, moral reasoning lacks a broadly accepted framework. This absence raises questions about the confidence we can place in AI's moral decision-making capabilities. The environments in which AI systems are typically trained today seem insufficiently rich for such a system to learn ethics from scratch, and even if we had an appropriate environment, it is unclear how we might bring about such learning. An alternative approach involves AI learning from human moral decisions. This learning process can involve aggregating curated human judgments or demonstrations in specific domains, or leveraging a foundation model fed with a wide range of data. Still, concerns persist, given the imperfections in human moral decision making. Given this, why should we ever automate moral decision making -- is it not better to leave all moral decision making to humans? This paper lays out a number of reasons why we should expect AI systems to engage in decisions with a moral component, with brief discussions of the associated risks.
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Conitzer, Vincent, Freedman, Rachel, Heitzig, Jobst, Holliday, Wesley H., Jacobs, Bob M., Lambert, Nathan, Mossรฉ, Milan, Pacuit, Eric, Russell, Stuart, Schoelkopf, Hailey, Tewolde, Emanuel, Zwicker, William S.
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.
Recursive Joint Simulation in Games
Kovarik, Vojtech, Oesterheld, Caspar, Conitzer, Vincent
Game-theoretic dynamics between AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to accurately simulate an AI agent, for example because its source code is known. Our aim is to explore ways of leveraging this possibility to achieve more cooperative outcomes in strategic settings. In this paper, we study an interaction between AI agents where the agents run a recursive joint simulation. That is, the agents first jointly observe a simulation of the situation they face. This simulation in turn recursively includes additional simulations (with a small chance of failure, to avoid infinite recursion), and the results of all these nested simulations are observed before an action is chosen. We show that the resulting interaction is strategically equivalent to an infinitely repeated version of the original game, allowing a direct transfer of existing results such as the various folk theorems.
Similarity-based cooperative equilibrium
Oesterheld, Caspar, Treutlein, Johannes, Grosse, Roger, Conitzer, Vincent, Foerster, Jakob
As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutually transparent to each other, i.e., to allow them to access one another's source code (Rubinstein 1998, Tennenholtz 2004) -- or weights in the case of ML agents. However, full transparency is often unrealistic, whereas partial transparency is commonplace. Moreover, it is challenging for agents to learn their way to cooperation in the full transparency setting. In this paper, we introduce a more realistic setting in which agents only observe a single number indicating how similar they are to each other. We prove that this allows for the same set of cooperative outcomes as the full transparency setting. We also demonstrate experimentally that cooperation can be learned using simple ML methods.
A Theory of Bounded Inductive Rationality
Oesterheld, Caspar, Demski, Abram, Conitzer, Vincent
The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
The Computational Complexity of Single-Player Imperfect-Recall Games
Tewolde, Emanuel, Oesterheld, Caspar, Conitzer, Vincent, Goldberg, Paul W.
It turns out there are a number of reasons why imperfect recall is relevant for AI agents; moreover, in cases where it is We study single-player extensive-form games with relevant, it is clear what the agent will and will not remember imperfect recall, such as the Sleeping Beauty problem - unlike in the case of human memory, which is harder to predict or the Absentminded Driver game. For such and consequently to model in standard representations of games, two natural equilibrium concepts have been imperfect recall. Imperfect-recall games already appear in the proposed as alternative solution concepts to ex-ante AI literature in the context of solving very large games such optimality. One equilibrium concept uses generalized as poker: one technique for solving such games is abstraction double halving (GDH) as a belief system and - i.e., reducing the game to a smaller, simplified one to solve evidential decision theory (EDT), and another one instead - and this process can give rise to imperfect recall in uses generalized thirding (GT) as a belief system the abstracted game [Waugh et al., 2009; Lanctot et al., 2012; and causal decision theory (CDT).
A Dataset on Malicious Paper Bidding in Peer Review
Jecmen, Steven, Yoon, Minji, Conitzer, Vincent, Shah, Nihar B., Fang, Fei
In conference peer review, reviewers are often asked to provide "bids" on each submitted paper that express their interest in reviewing that paper. A paper assignment algorithm then uses these bids (along with other data) to compute a high-quality assignment of reviewers to papers. However, this process has been exploited by malicious reviewers who strategically bid in order to unethically manipulate the paper assignment, crucially undermining the peer review process. For example, these reviewers may aim to get assigned to a friend's paper as part of a quid-pro-quo deal. A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of any publicly-available data on malicious paper bidding. In this work, we collect and publicly release a novel dataset to fill this gap, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously. We further provide a descriptive analysis of the bidding behavior, including our categorization of different strategies employed by participants. Finally, we evaluate the ability of each strategy to manipulate the assignment, and also evaluate the performance of some simple algorithms meant to detect malicious bidding. The performance of these detection algorithms can be taken as a baseline for future research on detecting malicious bidding.
Tradeoffs in Preventing Manipulation in Paper Bidding for Reviewer Assignment
Jecmen, Steven, Shah, Nihar B., Fang, Fei, Conitzer, Vincent
Many conferences rely on paper bidding as a key component of their reviewer assignment procedure. These bids are then taken into account when assigning reviewers to help ensure that each reviewer is assigned to suitable papers. However, despite the benefits of using bids, reliance on paper bidding can allow malicious reviewers to manipulate the paper assignment for unethical purposes (e.g., getting assigned to a friend's paper). Several different approaches to preventing this manipulation have been proposed and deployed. In this paper, we enumerate certain desirable properties that algorithms for addressing bid manipulation should satisfy. We then offer a high-level analysis of various approaches along with directions for future investigation.
Near-Optimal Reviewer Splitting in Two-Phase Paper Reviewing and Conference Experiment Design
Jecmen, Steven, Zhang, Hanrui, Liu, Ryan, Fang, Fei, Conitzer, Vincent, Shah, Nihar B.
Many scientific conferences employ a two-phase paper review process, where some papers are assigned additional reviewers after the initial reviews are submitted. Many conferences also design and run experiments on their paper review process, where some papers are assigned reviewers who provide reviews under an experimental condition. In this paper, we consider the question: how should reviewers be divided between phases or conditions in order to maximize total assignment similarity? We make several contributions towards answering this question. First, we prove that when the set of papers requiring additional review is unknown, a simplified variant of this problem is NP-hard. Second, we empirically show that across several datasets pertaining to real conference data, dividing reviewers between phases/conditions uniformly at random allows an assignment that is nearly as good as the oracle optimal assignment. This uniformly random choice is practical for both the two-phase and conference experiment design settings. Third, we provide explanations of this phenomenon by providing theoretical bounds on the suboptimality of this random strategy under certain natural conditions. From these easily-interpretable conditions, we provide actionable insights to conference program chairs about whether a random reviewer split is suitable for their conference.
Ethical Implementation of Artificial Intelligence to Select Embryos in In Vitro Fertilization
Afnan, Michael Anis Mihdi, Rudin, Cynthia, Conitzer, Vincent, Savulescu, Julian, Mishra, Abhishek, Liu, Yanhe, Afnan, Masoud
AI has the potential to revolutionize many areas of healthcare. Radiology, dermatology, and ophthalmology are some of the areas most likely to be impacted in the near future, and they have received significant attention from the broader research community. But AI techniques are now also starting to be used in in vitro fertilization (IVF), in particular for selecting which embryos to transfer to the woman. The contribution of AI to IVF is potentially significant, but must be done carefully and transparently, as the ethical issues are significant, in part because this field involves creating new people. We first give a brief introduction to IVF and review the use of AI for embryo selection. We discuss concerns with the interpretation of the reported results from scientific and practical perspectives. We then consider the broader ethical issues involved. We discuss in detail the problems that result from the use of black-box methods in this context and advocate strongly for the use of interpretable models. Importantly, there have been no published trials of clinical effectiveness, a problem in both the AI and IVF communities, and we therefore argue that clinical implementation at this point would be premature. Finally, we discuss ways for the broader AI community to become involved to ensure scientifically sound and ethically responsible development of AI in IVF.