Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture

Open in new window