Emerging Practices in Frontier AI Safety Frameworks

Buhl, Marie Davidsen, Bucknall, Ben, Masterson, Tammy

Feb-5-2025–arXiv.org Artificial Intelligence

At the AI Seoul Summit in 2024, a number o f AI developers signed on to the Frontier AI Safety Commitments, agreeing to develop a safety framework outlining how they will manage severe risks that their frontier AI systems may pose ( DSIT, 2024) . Since then, a research field has begun to emerge, with a diverse array of researchers from companies, governments, academi a and other third - party research organi s ations publishing work on how to write and implement an effective safety framework . S ignatories to the commitments are due to publish safety frameworks shortly, in time for the Paris AI Action Summit. This paper summarises emerging practice s - practices that appear promising and are gaining expert recognition - for safety frameworks as identified by this new research field. We draw on both the safety frameworks published so far, literature and standards on frontier AI risk management (as well as risk management more broadly), internal research by the UK AI Safety Institute, and the Frontier AI Safety Commitments themselves.

mitigation, safety framework, threshold, (12 more...)

arXiv.org Artificial Intelligence

Feb-5-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Papua New Guinea
  - Gulf Province > Kerema (0.04)
- North America > United States
  - New York (0.04)
  - Florida > Palm Beach County
    - Boca Raton (0.04)
- Europe > Latvia
  - Lubāna Municipality > Lubāna (0.04)
- Asia
  - South Korea > Seoul
    - Seoul (0.24)
  - India > Tamil Nadu
    - Chennai (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (0.33)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found