Radar de Parit\'e: An NLP system to measure gender representation in French news stories
Soumah, Valentin-Gabriel, Rao, Prashanth, Eibl, Philipp, Taboada, Maite
–arXiv.org Artificial Intelligence
We present the Radar de Parité, an automated Natural Language Processing (NLP) system that measures the proportion of women and men quoted daily in six Canadian French-language media outlets. We outline the system's architecture and detail the challenges we overcame to address French-specific issues, in particular regarding coreference resolution, a new contribution to the NLP literature on French. Our results highlight the underrepresentation of women in news stories, while also illustrating the application of modern NLP methods to measure gender representation and address societal issues. The commonality in most applied NLP research projects is the need to reliably and scalably extract information from unstructured text data. In this paper, we describe one such application: extracting quotes from news stories to quantify gender representation. Gender representation in the media is a long debated topic. From the 1970s, there have been studies into how much women and gender-diverse people are portrayed in news stories, with the general hypothesis that they tend to be underrepresented [1, 2]. There is also research studying how they are represented, i.e., whether sexist or homophobic tropes are present when we discuss women and gender-diverse people [3, 4]. In this work, we tackle one specific aspect of representation: who is quoted and in what proportions. Our starting hypothesis is that we hear less from women than from men in news stories, that is, that men are quoted more often than is to be expected from their proportion in the general population. To fully answer this question, we formulate a quantitative approach, collecting large amounts of representative data and extracting quotes from the unstructured text. This is the goal of the Radar de Parité. We define quotes as either direct or indirect reproductions of what a person said, and we define that person as a source in news articles. In order to extract quotes, we employ a full NLP pipeline, focusing on parsing to identify speakers, verbs, and quotes, in each news story. We then predict the gender of the speaker (or source), using external genderprediction services.
arXiv.org Artificial Intelligence
Apr-19-2023