Goto

Collaborating Authors

 lever




Data-dependent PAC-Bayes priors via differential privacy

Gintare Karolina Dziugaite, Daniel M. Roy

Neural Information Processing Systems

The Probably Approximately Correct (P AC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors. Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown. We show how an e -differentially private data-dependent prior yields a valid P AC-Bayes bound, and then show how non-private mechanisms for choosing priors can also yield generalization bounds. As an application of this result, we show that a Gaussian prior mean chosen via stochastic gradient Langevin dynamics (SGLD; Welling and Teh, 2011) leads to a valid P AC-Bayes bound given control of the 2-Wasserstein distance to an e -differentially private stationary distribution. We study our data-dependent bounds empirically, and show that they can be nonvacuous even when other distribution-dependent bounds are vacuous.


A Further discussion

Neural Information Processing Systems

We justify the way in which LIO accounts for the cost of incentivization as follows. Fundamentally, the reason is that the cost should be incurred only by the part of the agent that is directly responsible for incentivization. These updates are also used to compute the vector fields shown in Figure 2. With incentives, the players have payoff matrices in Table 2. Table 2: Payoff matrices for row player (left) and column player (right) with incentives. 's expected extrinsic return with respect to agent Hence descending a stochastic estimate of this gradient is equivalent to minimizing the loss in (10).


Levers of Power in the Field of AI

Mackenzie, Tammy, Punj, Sukriti, Perez, Natalie, Bhaduri, Sreyoshi, Radeljic, Branislav

arXiv.org Artificial Intelligence

This paper examines how decision makers in academia, government, business, and civil society navigate questions of power in implementations of artificial intelligence (AI). The study explores how individuals experience and exercise "levers of power," which are presented as social mechanisms that shape institutional responses to technological change. The study reports on the responses of personalized questionnaires designed to gather insight on a decision maker's institutional purview, based on an institutional governance framework developed from the work of Neo Institutionalists. Findings present the anonymized, real responses and circumstances of respondents in the form of twelve fictional personas of high-level decision makers from North America and Europe. These personas illustrate how personal agency, organizational logics, and institutional infrastructures may intersect in the governance of AI. The decision makers' responses to the questionnaires then inform a discussion of the field level personal power of decision-makers, methods of fostering institutional stability in times of change, and methods of influencing institutional change in the field of AI. The final section of the discussion presents a table of the dynamics of the levers of power in the field of AI for change makers and 5 testable hypotheses for institutional and social movement researchers. In summary, this study provides insight on the means for policymakers within institutions and their counterparts in civil society to personally engage with AI governance.


RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

Muhamed, Aashiq, Ribeiro, Leonardo F. R., Dreyer, Markus, Smith, Virginia, Diab, Mona T.

arXiv.org Artificial Intelligence

The ability of language models in RAG systems to selectively refuse to answer based on flawed context is critical for safety, yet remains a significant failure point. Our large-scale study reveals that even frontier models struggle in this setting, with refusal accuracy dropping below 50% on multi-document tasks, while exhibiting either dangerous overconfidence or overcaution. Static benchmarks fail to reliably evaluate this capability, as models exploit dataset-specific artifacts and memorize test instances. We introduce RefusalBench, a generative methodology that programmatically creates diagnostic test cases through controlled linguistic perturbation. Our framework employs 176 distinct perturbation strategies across six categories of informational uncertainty and three intensity levels. Evaluation of over 30 models uncovers systematic failure patterns: refusal comprises separable detection and categorization skills, and neither scale nor extended reasoning improves performance. We find that selective refusal is a trainable, alignment-sensitive capability, offering a clear path for improvement. We release two benchmarks -- RefusalBench-NQ (single document) and RefusalBench-GaRAGe (multi-document) -- and our complete generation framework to enable continued, dynamic evaluation of this critical capability.


Dynamic ReAct: Scalable Tool Selection for Large-Scale MCP Environments

Gaurav, Nishant, Akarsh, Adit, Ranjan, Ankit, Bajaj, Manoj

arXiv.org Artificial Intelligence

We present Dynamic ReAct, a novel approach for enabling ReAct agents to efficiently operate with extensive Model Control Protocol (MCP) tool sets that exceed the contextual memory limitations of large language models. Our approach addresses the fundamental challenge of tool selection in environments containing hundreds or thousands of available tools, where loading all tools simultaneously is computationally infeasible. We propose and evaluate five distinct architectures that progressively refine the tool selection process, culminating in a search-and-load mechanism that achieves intelligent tool selection with minimal computational overhead. Our experimental results demonstrate that the proposed approach reduces tool loading by up to 50% while maintaining task completion accuracy, advancing the path towards truly general-purpose AI agents capable of dynamically adapting to diverse task environments.



"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

Ding, Junchen, Jiang, Penghao, Xu, Zihao, Ding, Ziqi, Zhu, Yichen, Jiang, Jiaojiao, Li, Yuekang

arXiv.org Artificial Intelligence

As large language models (LLMs) increasingly mediate ethically sensitive decisions, understanding their moral reasoning processes becomes imperative. This study presents a comprehensive empirical evaluation of 14 leading LLMs, both reasoning enabled and general purpose, across 27 diverse trolley problem scenarios, framed by ten moral philosophies, including utilitarianism, deontology, and altruism. Using a factorial prompting protocol, we elicited 3,780 binary decisions and natural language justifications, enabling analysis along axes of decisional assertiveness, explanation answer consistency, public moral alignment, and sensitivity to ethically irrelevant cues. Our findings reveal significant variability across ethical frames and model types: reasoning enhanced models demonstrate greater decisiveness and structured justifications, yet do not always align better with human consensus. Notably, "sweet zones" emerge in altruistic, fairness, and virtue ethics framings, where models achieve a balance of high intervention rates, low explanation conflict, and minimal divergence from aggregated human judgments. However, models diverge under frames emphasizing kinship, legality, or self interest, often producing ethically controversial outcomes. These patterns suggest that moral prompting is not only a behavioral modifier but also a diagnostic tool for uncovering latent alignment philosophies across providers. We advocate for moral reasoning to become a primary axis in LLM alignment, calling for standardized benchmarks that evaluate not just what LLMs decide, but how and why.


'I sent AI to art school!' The postmodern master who taught a machine to beef up his old work

The Guardian

By the time you read this article, there's a good chance it will have already been scanned by an artificially intelligent machine. If asked about the artist David Salle, large language models such as ChatGPT or Gemini may repurpose some of the words below to come up with their answer. The bigger the data set, the more convincing the response – and Salle has been written about exhaustively since he first rose to art world stardom in the 1980s. The question is whether AI can ever say anything new about the artist and his work, or if it's for ever condemned to generate more of the same. A similar question lingers beneath the surface of the paintings that Salle has been making since 2023, a new series of which he has just unveiled at Thaddaeus Ropac in London.