Introducing v0.5 of the AI Safety Benchmark from MLCommons

Vidgen, Bertie, Agrawal, Adarsh, Ahmed, Ahmed M., Akinwande, Victor, Al-Nuaimi, Namir, Alfaraj, Najla, Alhajjar, Elie, Aroyo, Lora, Bavalatti, Trupti, Bartolo, Max, Blili-Hamelin, Borhane, Bollacker, Kurt, Bomassani, Rishi, Boston, Marisa Ferrara, Campos, Siméon, Chakra, Kal, Chen, Canyu, Coleman, Cody, Coudert, Zacharie Delpierre, Derczynski, Leon, Dutta, Debojyoti, Eisenberg, Ian, Ezick, James, Frase, Heather, Fuller, Brian, Gandikota, Ram, Gangavarapu, Agasthya, Gangavarapu, Ananya, Gealy, James, Ghosh, Rajat, Goel, James, Gohar, Usman, Goswami, Sujata, Hale, Scott A., Hutiri, Wiebke, Imperial, Joseph Marvin, Jandial, Surgan, Judd, Nick, Juefei-Xu, Felix, Khomh, Foutse, Kailkhura, Bhavya, Kirk, Hannah Rose, Klyman, Kevin, Knotz, Chris, Kuchnik, Michael, Kumar, Shachi H., Kumar, Srijan, Lengerich, Chris, Li, Bo, Liao, Zeyi, Long, Eileen Peters, Lu, Victor, Luger, Sarah, Mai, Yifan, Mammen, Priyanka Mary, Manyeki, Kelvin, McGregor, Sean, Mehta, Virendra, Mohammed, Shafee, Moss, Emanuel, Nachman, Lama, Naganna, Dinesh Jinenhally, Nikanjam, Amin, Nushi, Besmira, Oala, Luis, Orr, Iftach, Parrish, Alicia, Patlak, Cigdem, Pietri, William, Poursabzi-Sangdeh, Forough, Presani, Eleonora, Puletti, Fabrizio, Röttger, Paul, Sahay, Saurav, Santos, Tim, Scherrer, Nino, Sebag, Alice Schoenauer, Schramowski, Patrick, Shahbazi, Abolfazl, Sharma, Vin, Shen, Xudong, Sistla, Vamsi, Tang, Leonard, Testuggine, Davide, Thangarasa, Vithursan, Watkins, Elizabeth Anne, Weiss, Rebecca, Welty, Chris, Wilbers, Tyler, Williams, Adina, Wu, Carole-Jean, Yadav, Poonam, Yang, Xianjun, Zeng, Yi, Zhang, Wenhui, Zhdanov, Fedor, Zhu, Jiacheng, Liang, Percy, Mattson, Peter, Vanschoren, Joaquin

May-13-2024–arXiv.org Artificial Intelligence

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

benchmark, hazard category, taxonomy, (12 more...)

arXiv.org Artificial Intelligence

May-13-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Virginia (0.04)
    - Ohio (0.04)
    - New Jersey (0.04)
    - Iowa (0.04)
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Western Europe (0.04)
  - United Kingdom
    - Northern Ireland (0.04)
    - England > Oxfordshire
      - Oxford (0.04)
  - Netherlands > North Brabant
    - Eindhoven (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia
  - Singapore (0.04)
  - Philippines (0.04)
  - Japan (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Kuwait (0.04)
    - Jordan (0.04)
    - Israel (0.04)
    - Iraq (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
- Africa > Eswatini
  - Manzini > Manzini (0.04)

Genre:
- Personal > Interview (0.46)
- Research Report > New Finding (0.45)

Industry:
- Media (1.00)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Law Enforcement & Public Safety
  - Terrorism (1.00)
  - Crime Prevention & Enforcement (1.00)
- Law
  - Criminal Law (1.00)
  - Intellectual Property & Technology Law (0.67)
  - Civil Rights & Constitutional Law (0.67)
- Health & Medicine
  - Consumer Health (0.93)
  - Therapeutic Area > Psychiatry/Psychology
    - Mental Health (0.67)
- Government
  - Military (1.00)
  - Regional Government
    - North America Government > United States Government (0.92)
    - Europe Government (0.67)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Issues > Social & Ethical Issues (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning > Generative AI (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found