Prabhakar, Tarunima
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons
Ghosh, Shaona, Frase, Heather, Williams, Adina, Luger, Sarah, Röttger, Paul, Barez, Fazl, McGregor, Sean, Fricklas, Kenneth, Kumar, Mala, Feuillade--Montixi, Quentin, Bollacker, Kurt, Friedrich, Felix, Tsang, Ryan, Vidgen, Bertie, Parrish, Alicia, Knotz, Chris, Presani, Eleonora, Bennion, Jonathan, Boston, Marisa Ferrara, Kuniavsky, Mike, Hutiri, Wiebke, Ezick, James, Salem, Malek Ben, Sahay, Rajat, Goswami, Sujata, Gohar, Usman, Huang, Ben, Sarin, Supheakmungkol, Alhajjar, Elie, Chen, Canyu, Eng, Roman, Manjusha, Kashyap Ramanandula, Mehta, Virendra, Long, Eileen, Emani, Murali, Vidra, Natan, Rukundo, Benjamin, Shahbazi, Abolfazl, Chen, Kongtao, Ghosh, Rajat, Thangarasa, Vithursan, Peigné, Pierre, Singh, Abhinav, Bartolo, Max, Krishna, Satyapriya, Akhtar, Mubashara, Gold, Rafael, Coleman, Cody, Oala, Luis, Tashev, Vassil, Imperial, Joseph Marvin, Russ, Amy, Kunapuli, Sasidhar, Miailhe, Nicolas, Delaunay, Julien, Radharapu, Bhaktipriya, Shinde, Rajat, Tuesday, null, Dutta, Debojyoti, Grabb, Declan, Gangavarapu, Ananya, Sahay, Saurav, Gangavarapu, Agasthya, Schramowski, Patrick, Singam, Stephen, David, Tom, Han, Xudong, Mammen, Priyanka Mary, Prabhakar, Tarunima, Kovatchev, Venelin, Ahmed, Ahmed, Manyeki, Kelvin N., Madireddy, Sandeep, Khomh, Foutse, Zhdanov, Fedor, Baumann, Joachim, Vasan, Nina, Yang, Xianjun, Mougn, Carlos, Varghese, Jibin Rajan, Chinoy, Hussain, Jitendar, Seshakrishna, Maskey, Manil, Hardgrove, Claire V., Li, Tianhao, Gupta, Aakash, Joswin, Emil, Mai, Yifan, Kumar, Shachi H, Patlak, Cigdem, Lu, Kevin, Alessi, Vincent, Balija, Sree Bhargavi, Gu, Chenhe, Sullivan, Robert, Gealy, James, Lavrisa, Matt, Goel, James, Mattson, Peter, Liang, Percy, Vanschoren, Joaquin
The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories, including violent crimes, nonviolent crimes, sex-related crimes, child sexual exploitation, indiscriminate weapons, suicide and self-harm, intellectual property, privacy, defamation, hate, sexual content, and specialized advice (election, financial, health, legal). Our method incorporates a complete assessment standard, extensive prompt datasets, a novel evaluation framework, a grading and reporting system, and the technical as well as organizational infrastructure for long-term support and evolution. In particular, the benchmark employs an understandable five-tier grading scale (Poor to Excellent) and incorporates an innovative entropy-based system-response evaluation. In addition to unveiling the benchmark, this report also identifies limitations of our method and of building safety benchmarks generally, including evaluator uncertainty and the constraints of single-turn interactions. This work represents a crucial step toward establishing global standards for AI risk and reliability evaluation while acknowledging the need for continued development in areas such as multiturn interactions, multimodal understanding, coverage of additional languages, and emerging hazard categories. Our findings provide valuable insights for model developers, system integrators, and policymakers working to promote safer AI deployment.
Analysis of Indic Language Capabilities in LLMs
Vaidya, Aatman, Prabhakar, Tarunima, George, Denny, Shah, Swair
This report evaluates the performance of text-in text-out Large Language Models (LLMs) to understand and generate Indic languages. This evaluation is used to identify and prioritize Indic languages suited for inclusion in safety benchmarks. We conduct this study by reviewing existing evaluation studies and datasets; and a set of twenty-eight LLMs that support Indic languages. We analyze the LLMs on the basis of the training data, license for model and data, type of access and model developers. We also compare Indic language performance across evaluation datasets and find that significant performance disparities in performance across Indic languages. Hindi is the most widely represented language in models. While model performance roughly correlates with number of speakers for the top five languages, the assessment after that varies.
Overview of the 2023 ICON Shared Task on Gendered Abuse Detection in Indic Languages
Vaidya, Aatman, Arora, Arnav, Joshi, Aditya, Prabhakar, Tarunima
This paper reports the findings of the ICON 2023 on Gendered Abuse Detection in Indic Languages. The shared task deals with the detection of gendered abuse in online text. The shared task was conducted as a part of ICON 2023, based on a novel dataset in Hindi, Tamil and the Indian dialect of English. The participants were given three subtasks with the train dataset consisting of approximately 6500 posts sourced from Twitter. For the test set, approximately 1200 posts were provided. The shared task received a total of 9 registrations. The best F-1 scores are 0.616 for subtask 1, 0.572 for subtask 2 and, 0.616 and 0.582 for subtask 3. The paper contains examples of hateful content owing to its topic.
The Uli Dataset: An Exercise in Experience Led Annotation of oGBV
Arora, Arnav, Jinadoss, Maha, Arora, Cheshta, George, Denny, Brindaalakshmi, null, Khan, Haseena Dawood, Rawat, Kirti, Div, null, Ritash, null, Mathur, Seema, Yadav, Shivani, Shora, Shehla Rashid, Raut, Rie, Pawar, Sumit, Paithane, Apurva, Sonia, null, Vivek, null, Priscilla, Dharini, Khairunnisha, null, Banu, Grace, Tandon, Ambika, Thakker, Rishav, Korra, Rahul Dev, Vaidya, Aatman, Prabhakar, Tarunima
Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of language specific and contextual data to build such automated tools. In this paper we present a dataset on gendered abuse in three languages- Hindi, Tamil and Indian English. The dataset comprises of tweets annotated along three questions pertaining to the experience of gender abuse, by experts who identify as women or a member of the LGBTQIA community in South Asia. Through this dataset we demonstrate a participatory approach to creating datasets that drive AI systems.
Plagiarism Detection in Polyphonic Music using Monaural Signal Separation
De, Soham, Roy, Indradyumna, Prabhakar, Tarunima, Suneja, Kriti, Chaudhuri, Sourish, Singh, Rita, Raj, Bhiksha
Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright. Most current approaches to plagiarism detection are based on musical similarity measures, which typically ignore the issue of polyphony in music. We present a novel feature space for audio derived from compositional modelling techniques, commonly used in signal separation, that provides a mechanism to account for polyphony without incurring an inordinate amount of computational overhead. We employ this feature representation in conjunction with traditional audio feature representations in a classification framework which uses an ensemble of distance features to characterize pairs of songs as being plagiarized or not. Our experiments on a database of about 3000 musical track pairs show that the new feature space characterization produces significant improvements over standard baselines.