Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Dalrymple, David "davidad", Skalse, Joar, Bengio, Yoshua, Russell, Stuart, Tegmark, Max, Seshia, Sanjit, Omohundro, Steve, Szegedy, Christian, Goldhaber, Ben, Ammann, Nora, Abate, Alessandro, Halpern, Joe, Barrett, Clark, Zhao, Ding, Zhi-Xuan, Tan, Wing, Jeannette, Tenenbaum, Joshua

arXiv.org Artificial Intelligence 

We introduce and define a family of approaches to AI safety, collectively referred to as guaranteed safe (GS) AI. These Ensuring that AI systems reliably and robustly approaches aim to provide high-assurance quantitative guarantees avoid harmful or dangerous behaviours is a crucial about the safety of an AI system's behaviour through challenge, especially for AI systems with a the use of three core components -- a formal safety specification, high degree of autonomy and general intelligence, a world model, and a verifier. We will argue that this or systems used in safety-critical contexts. In strategy is both promising and underexplored, and contrast it this position paper, we will introduce and define with other ongoing efforts in AI safety. We will also outline a family of approaches to AI safety, which we several ongoing avenues of research within the broader GS will refer to as guaranteed safe (GS) AI. The core research agenda, identify some of their core difficulties, and feature of these approaches is that they aim to produce discuss approaches for overcoming these difficulties. Central AI systems which are equipped with highassurance examples of agendas which fall under the GS AI family quantitative safety guarantees. This include Szegedy (2020); Wing (2021); Seshia et al. (2022); is achieved by the interplay of three core components: Russell (2022); Tegmark & Omohundro (2023); 'davidad' a world model (which provides a mathematical Dalrymple (2024); Bengio (2024).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found