lab software
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Götting, Jasper, Medeiros, Pedro, Sanders, Jon G, Li, Nathaniel, Phan, Long, Elabd, Karam, Justen, Lennart, Hendrycks, Dan, Donoughe, Seth
We present the Virology Capabilities Test (VCT), a large language model (LLM) benchmark that measures the capability to troubleshoot complex virology laboratory protocols. Constructed from the inputs of dozens of PhD-level expert virologists, VCT consists of $322$ multimodal questions covering fundamental, tacit, and visual knowledge that is essential for practical work in virology laboratories. VCT is difficult: expert virologists with access to the internet score an average of $22.1\%$ on questions specifically in their sub-areas of expertise. However, the most performant LLM, OpenAI's o3, reaches $43.8\%$ accuracy, outperforming $94\%$ of expert virologists even within their sub-areas of specialization. The ability to provide expert-level virology troubleshooting is inherently dual-use: it is useful for beneficial research, but it can also be misused. Therefore, the fact that publicly available models outperform virologists on VCT raises pressing governance considerations. We propose that the capability of LLMs to provide expert-level troubleshooting of dual-use virology work should be integrated into existing frameworks for handling dual-use technologies in the life sciences.
- Oceania > Australia (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- South America (0.04)
- (4 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
AI can detect DNA that unlocks backdoors in lab software
A backdoor hidden in lab software that is activated when fed a specially crafted digital DNA sample. Typically, this backdoor would be introduced in a supply-chain attack, as we saw with the compromised SolarWinds monitoring tools. When the lab analysis software processes a digital sample of genetic material with the trigger encoded, the backdoor in the application activates: the trigger could include an IP address and network port to covertly connect to, or other instructions to carry out, allowing spies to snoop on and interfere with the DNA processing pipeline. It could be used to infiltrate national health institutions, research organizations, and healthcare companies, because few have recognized the potential of biological matter as the carrier or trigger of malware. Just as you can use DNA in living bacteria to hold information, this storage can be weaponized against applications processing that data.