Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture

Ackerman, Gary, Behlendorf, Brandon, Kallenborn, Zachary, Almakki, Sheriff, Clifford, Doug, LaTourette, Jenna, Peterson, Hayley, Sheinbaum, Noah, Shoemaker, Olivia, Wetzel, Anna

arXiv.org Artificial Intelligence 

The potential for rapidly - evolving frontier artificial intelligence (AI) models - especially large language models (LLM s) - to facilitate bioterrorism or access to biological weapons has generated significant policy, academic, and public concern. Both model developers and policymakers seek to quantify and mitigate that risk, with an important element of such efforts being t he development of model benchmarks that can assess the biosecurity risk posed by a particular model. This paper describes the first component of a novel Biothreat Benchmark Generation (BBG) Framework . The BBG is designed to help model developers and evalua tors reliably measure and assess the biosecurity risk uplift and general harm potential of existing and future AI models, while accounting for key aspects of the threat itself that are often overlooked in other benchmarking efforts, including different act or capability levels, and operational (in addition to purely technical) risk factors. To accomplish this, the BBG is built upon a hierarchical structure of biothreat categories, elements and tasks, which then serves as the basis for the development of task - aligned queries. As a pilot, the BBG is first being developed to address bacterial biological threats only. This paper outlines the development of this biothreat task - query architecture, which we have named the Bacterial Biothreat Schema, while future papers will describe follow - on efforts to turn queries into model prompts, as well as metrics for determining the diagnosticity of these prompts for use as benchmarks and how the resulting benchmarks can be implemented for model evaluation. Ov erall, the BBG F ramework, including the Bacterial Biothreat Schema, seek to offer a robust, re - usable structure for evaluating bacterial biological risks arising from LLMs, a structure that allows for multiple levels of aggregation, captures the full scope of technical and operational requirements for biological adversari es, and accounts for a wide spectrum of biological adversary capabilities.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found