Metaxa, Danaë
Learning About Algorithm Auditing in Five Steps: Scaffolding How High School Youth Can Systematically and Critically Evaluate Machine Learning Applications
Morales-Navarro, Luis, Kafai, Yasmin B., Vogelstein, Lauren, Yu, Evelyn, Metaxa, Danaë
While there is widespread interest in supporting young people to critically evaluate machine learning-powered systems, there is little research on how we can support them in inquiring about how these systems work and what their limitations and implications may be. Outside of K-12 education, an effective strategy in evaluating black-boxed systems is algorithm auditing-a method for understanding algorithmic systems' opaque inner workings and external impacts from the outside in. In this paper, we review how expert researchers conduct algorithm audits and how end users engage in auditing practices to propose five steps that, when incorporated into learning activities, can support young people in auditing algorithms. We present a case study of a team of teenagers engaging with each step during an out-of-school workshop in which they audited peer-designed generative AI TikTok filters. We discuss the kind of scaffolds we provided to support youth in algorithm auditing and directions and challenges for integrating algorithm auditing into classroom activities. This paper contributes: (a) a conceptualization of five steps to scaffold algorithm auditing learning activities, and (b) examples of how youth engaged with each step during our pilot study.
Identity-related Speech Suppression in Generative AI Content Moderation
Anigboro, Oghenefejiro Isaacs, Crawford, Charlie M., Metaxa, Danaë, Friedler, Sorelle A.
Automated content moderation systems have long been used to help reduce the occurrence of violent, hateful, sexual, or otherwise undesired user-generated content online, including in online comment sections and by social media platforms [7, 19, 24]. As content is generated by AI systems, automated content moderation techniques are being applied to the text generated by these systems to filter unwanted content before it is shown to users [21, 22]. However, content moderation is known to suffer from identity-related biases, such that speech by or about marginalized identities is more likely to be incorrectly flagged as inappropriate content [5, 10, 27]. In this paper, we conduct an audit of five content moderation systems to measure identity-related speech suppression, introducing benchmark datasets and definitions to quantify these biases in the context of generative AI systems. Previous assessments of content moderation systems have used benchmark datasets to measure effectiveness and bias. These include datasets composed of user-generated content, such as tweets or internet comments, that have been hand-labeled according to a content moderation rubric [2, 8]. However, most of these datasets are composed of short-form content and do not include the types of text involved in generative AI systems, be they user-generated prompts or system-provided responses. Automated content moderation systems applied in generative AI settings may have unexpected or undesired results, for example flagging PG-rated movie scripts as inappropriate content [21]. As generative AI is increasingly used for creative and expressive text generation from schools to Hollywood, this paper is motivated by this question: whose stories won't be told?
The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring
Armstrong, Lena, Liu, Abbey, MacNeil, Stephen, Metaxa, Danaë
Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, in particular when used in workplace contexts.