Purver, Matthew
Scaling for Fairness? Analyzing Model Size, Data Composition, and Multilinguality in Vision-Language Bias
Sahili, Zahraa Al, Patras, Ioannis, Purver, Matthew
As large scale vision language models become increasingly central to modern AI applications, understanding and mitigating social biases in these systems has never been more critical. We investigate how dataset composition, model size, and multilingual training affect gender and racial bias in a popular VLM, CLIP, and its open source variants. In particular, we systematically evaluate models trained on varying dataset scales and architectures, as well as multilingual versions encompassing English along with Persian, Turkish, and Finnish,languages with minimal gender marking. To assess social perception bias, we measure the zero-shot performance on face images featuring socially charged terms rooted in the psychological constructs of communion and agency, and demographic labeling bias using both the FairFace and PATA datasets. Our findings reveal three key insights. First, while larger training datasets can mitigate some biases, they may also introduce or amplify others when the data composition is imbalanced. Second, although increasing model size generally improves performance, it does not consistently reduce bias and can, in certain cases, exacerbate it. Finally, while multilingual training broadens linguistic coverage, it does not inherently neutralize bias and can transfer or intensify inequities across languages. Taken together, these results highlight the necessity of inclusive, carefully curated training data to foster fairness rather than relying solely on model scaling or language expansion. We provide a systematic evaluation for vision language bias across diverse demographics, underscoring the urgent need for intentional bias mitigation strategies in next-generation AI systems.
Recent Trends in Linear Text Segmentation: a Survey
Ghinassi, Iacopo, Wang, Lin, Newell, Chris, Purver, Matthew
Linear Text Segmentation is the task of automatically tagging text documents with topic shifts, i.e. the places in the text where the topics change. A well-established area of research in Natural Language Processing, drawing from well-understood concepts in linguistic and computational linguistic research, the field has recently seen a lot of interest as a result of the surge of text, video, and audio available on the web, which in turn require ways of summarising and categorizing the mole of content for which linear text segmentation is a fundamental step. In this survey, we provide an extensive overview of current advances in linear text segmentation, describing the state of the art in terms of resources and approaches for the task. Finally, we highlight the limitations of available resources and of the task itself, while indicating ways forward based on the most recent literature and under-explored research directions.
Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis
Andrenšek, Luka, Koloski, Boshko, Pelicon, Andraž, Lavrač, Nada, Pollak, Senja, Purver, Matthew
We investigate zero-shot cross-lingual news sentiment detection, aiming to develop robust sentiment classifiers that can be deployed across multiple languages without target-language training data. We introduce novel evaluation datasets in several less-resourced languages, and experiment with a range of approaches including the use of machine translation; in-context learning with large language models; and various intermediate training regimes including a novel task objective, POA, that leverages paragraph-level information. Our results demonstrate significant improvements over the state of the art, with in-context learning generally giving the best performance, but with the novel POA approach giving a competitive alternative with much lower computational overhead. We also show that language similarity is not in itself sufficient for predicting the success of cross-lingual transfer, but that similarity in semantic content and structure can be equally important.
EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts
Sahili, Zahraa Al, Patras, Ioannis, Purver, Matthew
In the domain of text-to-image generative models, the inadvertent propagation of biases inherent in training datasets poses significant ethical challenges, particularly in the generation of socially sensitive content. This paper introduces EquiPrompt, a novel method employing Chain of Thought (CoT) reasoning to reduce biases in text-to-image generative models. EquiPrompt uses iterative bootstrapping and bias-aware exemplar selection to balance creativity and ethical responsibility. It integrates iterative reasoning refinement with controlled evaluation techniques, addressing zero-shot CoT issues in sensitive contexts. Experiments on several generation tasks show EquiPrompt effectively lowers bias while maintaining generative quality, advancing ethical AI and socially responsible creative processes.Code will be publically available.
A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media
Caporusso, Jaya, Hoogland, Damar, Brglez, Mojca, Koloski, Boshko, Purver, Matthew, Pollak, Senja
Dehumanisation involves the perception and/or treatment of a social group's members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection, and a new method for statistical significance testing. We then apply it to study attitudes to migration expressed in Slovene newspapers, to examine changes in the Slovene discourse on migration between the 2015-16 migration crisis following the war in Syria and the 2022-23 period following the war in Ukraine. We find that while this discourse became more negative and more intense over time, it is less dehumanising when specifically addressing Ukrainian migrants compared to others.
Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction
Hosseini, Peyman, Hosseini, Mehran, Al-Azzawi, Sana Sabah, Liwicki, Marcus, Castro, Ignacio, Purver, Matthew
We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while keeping other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper.
Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia
Gkoumas, Dimitris, Purver, Matthew, Liakata, Maria
Dementia is a neuro-degenerative disease affecting Early work in NLP for dementia relied on manual millions worldwide and is associated with cognitive engineered features based on specific lexical, decline, including language impairment (Forbes-acoustic and syntactic features stemming from description McKay and Venneri, 2005). Language dysfunction tasks (such as CTP), to detect linguistic may be difficult to detect in the early stages of dementia signs of cognitive decline (Fraser et al., 2016; Beltrami (Nestor et al., 2004); however, as the disease et al., 2018; Yeung et al., 2021). Recent progresses, a gradual decline of semantic knowledge work uses naive neural approaches to classify and ensues, and eventually, all linguistic functions analyse linguistic and acoustic characteristics so can be lost (Tang-Wai and Graham, 2008; Klimova as to either predict cognitive scores or achieve binary et al., 2015). Recognizing language disorders as classification of participants (Alzheimer's Disease prodromal symptoms in people with dementia may (AD) vs non-AD) (Karlekar et al., 2018; Balagopalan help with earlier diagnosis and improve disease et al., 2020; Nasreen et al., 2021b; Rohanian management.
CoRAL: a Context-aware Croatian Abusive Language Dataset
Shekhar, Ravi, Karan, Mladen, Purver, Matthew
In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL -- a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.
A Longitudinal Multi-modal Dataset for Dementia Monitoring and Diagnosis
Gkoumas, Dimitris, Wang, Bo, Tsakalidis, Adam, Wolters, Maria, Zubiaga, Arkaitz, Purver, Matthew, Liakata, Maria
Dementia is a family of neurogenerative conditions affecting memory and cognition in an increasing number of individuals in our globally aging population. Automated analysis of language, speech and paralinguistic indicators have been gaining popularity as potential indicators of cognitive decline. Here we propose a novel longitudinal multi-modal dataset collected from people with mild dementia and age matched controls over a period of several months in a natural setting. The multi-modal data consists of spoken conversations, a subset of which are transcribed, as well as typed and written thoughts and associated extra-linguistic information such as pen strokes and keystrokes. We describe the dataset in detail and proceed to focus on a task using the speech modality. The latter involves distinguishing controls from people with dementia by exploiting the longitudinal nature of the data. Our experiments showed significant differences in how the speech varied from session to session in the control and dementia groups.
Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics
Sadrzadeh, Mehrnoosh, Purver, Matthew, Hough, Julian, Kempson, Ruth
One of the fundamental requirements for models of semantic processing in dialogue is incrementality: a model must reflect how people interpret and generate language at least on a word-by-word basis, and handle phenomena such as fragments, incomplete and jointly-produced utterances. We show that the incremental word-by-word parsing process of Dynamic Syntax (DS) can be assigned a compositional distributional semantics, with the composition operator of DS corresponding to the general operation of tensor contraction from multilinear algebra. We provide abstract semantic decorations for the nodes of DS trees, in terms of vectors, tensors, and sums thereof; using the latter to model the underspecified elements crucial to assigning partial representations during incremental processing. As a working example, we give an instantiation of this theory using plausibility tensors of compositional distributional semantics, and show how our framework can incrementally assign a semantic plausibility measure as it parses phrases and sentences.