Goto

Collaborating Authors

 credence


Are LLM Belief Updates Consistent with Bayes' Theorem?

Imran, Sohaib, Kendiukhov, Ihor, Broerman, Matthew, Thomas, Aditya, Campanella, Riccardo, Lamb, Rob, Atkinson, Peter M.

arXiv.org Artificial Intelligence

Do larger and more capable language models learn to update their "beliefs" about propositions more consistently with Bayes' theorem when presented with evidence in-context? To test this, we formulate a Bayesian Coherence Coefficient (BCC) metric and generate a dataset with which to measure the BCC. We measure BCC for multiple pre-trained-only language models across five model families, comparing against the number of model parameters, the amount of training data, and model scores on common benchmarks. Our results provide evidence for our hypothesis that larger and more capable pre-trained language models assign credences that are more coherent with Bayes' theorem. These results have important implications for our understanding and governance of LLMs.


The Problem of the Priors, or Posteriors?

Lin, Hanti

arXiv.org Artificial Intelligence

The problem of the priors is well known: it concerns the challenge of identifying norms that govern one's prior credences. I argue that a key to addressing this problem lies in considering what I call the problem of the posteriors -- the challenge of identifying norms that directly govern one's posterior credences, which then induce constraints on the priors via the diachronic requirement of conditionalization. This forward-looking approach can be summarized as: Think ahead, work backward. Although this idea can be traced to Freedman (1963), Carnap (1963), and Shimony (1970), it has received little attention in philosophy. In this paper, I initiate a systematic defense of forward-looking Bayesianism, addressing potential objections from more traditional views (both subjectivist and objectivist) and arguing for its advantages. In particular, I develop a specific approach to forward-looking Bayesianism -- one that treats the convergence of posterior credences to the truth as a fundamental rather than derived normative requirement. This approach, called convergentist Bayesianism, is argued to be crucial for a Bayesian foundation of Ockham's razor and related inference methods in statistics and machine learning.


Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Wen, Jiaxin, Hebbar, Vivek, Larson, Caleb, Bhatt, Aryan, Radhakrishnan, Ansh, Sharma, Mrinank, Sleight, Henry, Feng, Shi, He, He, Perez, Ethan, Shlegeris, Buck, Khan, Akbir

arXiv.org Artificial Intelligence

As large language models (LLMs) become increasingly capable, it is prudent to assess whether safety measures remain effective even if LLMs intentionally try to bypass them. Previous work introduced control evaluations, an adversarial framework for testing deployment strategies of untrusted models (i.e., models which might be trying to bypass safety measures). While prior work treats a single failure as unacceptable, we perform control evaluations in a "distributed threat setting" -- a setting where no single action is catastrophic and no single action provides overwhelming evidence of misalignment. We approach this problem with a two-level deployment framework that uses an adaptive macro-protocol to choose between micro-protocols. Micro-protocols operate on a single task, using a less capable, but extensively tested (trusted) model to harness and monitor the untrusted model. Meanwhile, the macro-protocol maintains an adaptive credence on the untrusted model's alignment based on its past actions, using it to pick between safer and riskier micro-protocols. We evaluate our method in a code generation testbed where a red team attempts to generate subtly backdoored code with an LLM whose deployment is safeguarded by a blue team. We plot Pareto frontiers of safety (# of non-backdoored solutions) and usefulness (# of correct solutions). At a given level of usefulness, our adaptive deployment strategy reduces the number of backdoors by 80% compared to non-adaptive baselines.


Stunned by Sleeping Beauty: How Prince Probability updates his forecast upon their fateful encounter

Walleghem, Laurens

arXiv.org Artificial Intelligence

The Sleeping Beauty problem is a puzzle in probability theory that has gained much attention since Elga's discussion of it [Elga, Adam, Analysis 60 (2), p.143-147 (2000)]. Sleeping Beauty is put asleep, and a coin is tossed. If the outcome of the coin toss is Tails, Sleeping Beauty is woken up on Monday, put asleep again and woken up again on Tuesday (with no recollection of having woken up on Monday). If the outcome is Heads, Sleeping Beauty is woken up on Monday only. Each time Sleeping Beauty is woken up, she is asked what her belief is that the outcome was Heads. What should Sleeping Beauty reply? In literature arguments have been given for both 1/3 and 1/2 as the correct answer. In this short note we argue using simple Bayesian probability theory why 1/3 is the right answer, and not 1/2. Briefly, when Sleeping Beauty awakens, her being awake is nontrivial extra information that leads her to update her beliefs about Heads to 1/3. We strengthen our claim by considering an additional observer, Prince Probability, who may or may not meet Sleeping Beauty. If he meets Sleeping Beauty while she is awake, he lowers his credence in Heads to 1/3. We also briefly consider the credence in Heads of a Sleeping Beauty who knows that she is dreaming (and thus asleep).


Moral Uncertainty and the Problem of Fanaticism

Szabo, Jazon, Such, Jose, Criado, Natalia, Modgil, Sanjay

arXiv.org Artificial Intelligence

While there is universal agreement that agents ought to act ethically, there is no agreement as to what constitutes ethical behaviour. To address this problem, recent philosophical approaches to `moral uncertainty' propose aggregation of multiple ethical theories to guide agent behaviour. However, one of the foundational proposals for aggregation - Maximising Expected Choiceworthiness (MEC) - has been criticised as being vulnerable to fanaticism; the problem of an ethical theory dominating agent behaviour despite low credence (confidence) in said theory. Fanaticism thus undermines the `democratic' motivation for accommodating multiple ethical perspectives. The problem of fanaticism has not yet been mathematically defined. Representing moral uncertainty as an instance of social welfare aggregation, this paper contributes to the field of moral uncertainty by 1) formalising the problem of fanaticism as a property of social welfare functionals and 2) providing non-fanatical alternatives to MEC, i.e. Highest k-trimmed Mean and Highest Median.


Could a Large Language Model be Conscious?

Chalmers, David J.

arXiv.org Artificial Intelligence

There has recently been widespread discussion of whether large language models might be sentient or conscious. Should we take this idea seriously? I will break down the strongest reasons for and against. Given mainstream assumptions in the science of consciousness, there are significant obstacles to consciousness in current models: for example, their lack of recurrent processing, a global workspace, and unified agency. At the same time, it is quite possible that these obstacles will be overcome in the next decade or so. I conclude that while it is somewhat unlikely that current large language models are conscious, we should take seriously the possibility that successors to large language models may be conscious in the not-too-distant future.


Constrained Submodular Optimization for Vaccine Design

Dai, Zheng, Gifford, David

arXiv.org Artificial Intelligence

Advances in machine learning have enabled the prediction of immune system responses to prophylactic and therapeutic vaccines. However, the engineering task of designing vaccines remains a challenge. In particular, the genetic variability of the human immune system makes it difficult to design peptide vaccines that provide widespread immunity in vaccinated populations. We introduce a framework for evaluating and designing peptide vaccines that uses probabilistic machine learning models, and demonstrate its ability to produce designs for a SARS-CoV-2 vaccine that outperform previous designs. We provide a theoretical analysis of the approximability, scalability, and complexity of our framework.


Validating Causal Inference Methods

Parikh, Harsh, Varjao, Carlos, Xu, Louise, Tchetgen, Eric Tchetgen

arXiv.org Artificial Intelligence

The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no `one-size-fits-all' causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework's novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.


The end of Sleeping Beauty's nightmare

Groisman, Berry

arXiv.org Artificial Intelligence

The way a rational agent changes her belief in certain propositions/hypotheses in the light of new evidence lies at the heart of Bayesian inference. The basic natural assumption, as summarized in van Fraassen's Reflection Principle ([1984]), would be that in the absence of new evidence the belief should not change. Yet, there are examples that are claimed to violate this assumption. The apparent paradox presented by such examples, if not settled, would demonstrate the inconsistency and/or incompleteness of the Bayesian approach and without eliminating this inconsistency, the approach cannot be regarded as scientific. The Sleeping Beauty Problem is just such an example. The existing attempts to solve the problem fall into three categories. The first two share the view that new evidence is absent, but differ about the conclusion of whether Sleeping Beauty should change her belief or not, and why. The third category is characterized by the view that, after all, new evidence (although hidden from the initial view) is involved. My solution is radically different and does not fall in either of these categories. I deflate the paradox by arguing that the two different degrees of belief presented in the Sleeping Beauty Problem are in fact beliefs in two different propositions, i.e. there is no need to explain the (un)change of belief.