Introducing v0.5 of the AI Safety Benchmark from MLCommons
Vidgen, Bertie, Agrawal, Adarsh, Ahmed, Ahmed M., Akinwande, Victor, Al-Nuaimi, Namir, Alfaraj, Najla, Alhajjar, Elie, Aroyo, Lora, Bavalatti, Trupti, Bartolo, Max, Blili-Hamelin, Borhane, Bollacker, Kurt, Bomassani, Rishi, Boston, Marisa Ferrara, Campos, Siméon, Chakra, Kal, Chen, Canyu, Coleman, Cody, Coudert, Zacharie Delpierre, Derczynski, Leon, Dutta, Debojyoti, Eisenberg, Ian, Ezick, James, Frase, Heather, Fuller, Brian, Gandikota, Ram, Gangavarapu, Agasthya, Gangavarapu, Ananya, Gealy, James, Ghosh, Rajat, Goel, James, Gohar, Usman, Goswami, Sujata, Hale, Scott A., Hutiri, Wiebke, Imperial, Joseph Marvin, Jandial, Surgan, Judd, Nick, Juefei-Xu, Felix, Khomh, Foutse, Kailkhura, Bhavya, Kirk, Hannah Rose, Klyman, Kevin, Knotz, Chris, Kuchnik, Michael, Kumar, Shachi H., Kumar, Srijan, Lengerich, Chris, Li, Bo, Liao, Zeyi, Long, Eileen Peters, Lu, Victor, Luger, Sarah, Mai, Yifan, Mammen, Priyanka Mary, Manyeki, Kelvin, McGregor, Sean, Mehta, Virendra, Mohammed, Shafee, Moss, Emanuel, Nachman, Lama, Naganna, Dinesh Jinenhally, Nikanjam, Amin, Nushi, Besmira, Oala, Luis, Orr, Iftach, Parrish, Alicia, Patlak, Cigdem, Pietri, William, Poursabzi-Sangdeh, Forough, Presani, Eleonora, Puletti, Fabrizio, Röttger, Paul, Sahay, Saurav, Santos, Tim, Scherrer, Nino, Sebag, Alice Schoenauer, Schramowski, Patrick, Shahbazi, Abolfazl, Sharma, Vin, Shen, Xudong, Sistla, Vamsi, Tang, Leonard, Testuggine, Davide, Thangarasa, Vithursan, Watkins, Elizabeth Anne, Weiss, Rebecca, Welty, Chris, Wilbers, Tyler, Williams, Adina, Wu, Carole-Jean, Yadav, Poonam, Yang, Xianjun, Zeng, Yi, Zhang, Wenhui, Zhdanov, Fedor, Zhu, Jiacheng, Liang, Percy, Mattson, Peter, Vanschoren, Joaquin
–arXiv.org Artificial Intelligence
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
arXiv.org Artificial Intelligence
May-13-2024
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Virginia (0.04)
- Ohio (0.04)
- New Jersey (0.04)
- Iowa (0.04)
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Canada > Quebec
- Montreal (0.04)
- Europe
- Western Europe (0.04)
- United Kingdom
- Northern Ireland (0.04)
- England > Oxfordshire
- Oxford (0.04)
- Netherlands > North Brabant
- Eindhoven (0.04)
- France > Hauts-de-France
- Asia
- Singapore (0.04)
- Philippines (0.04)
- Japan (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Middle East
- Africa > Eswatini
- North America
- Genre:
- Personal > Interview (0.46)
- Research Report > New Finding (0.45)
- Industry:
- Media (1.00)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Law Enforcement & Public Safety
- Terrorism (1.00)
- Crime Prevention & Enforcement (1.00)
- Law
- Criminal Law (1.00)
- Intellectual Property & Technology Law (0.67)
- Civil Rights & Constitutional Law (0.67)
- Health & Medicine
- Consumer Health (0.93)
- Therapeutic Area > Psychiatry/Psychology
- Mental Health (0.67)
- Government
- Military (1.00)
- Regional Government
- North America Government > United States Government (0.92)
- Europe Government (0.67)
- Technology:
- Information Technology
- Communications > Social Media (1.00)
- Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Issues > Social & Ethical Issues (1.00)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Machine Learning > Neural Networks
- Deep Learning > Generative AI (0.46)
- Information Technology