narrative element
From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship
Multimodal large language models (MLLMs) have shown impressive capabilities across tasks involving both visual and textual modalities. However, growing concerns remain about their potential to encode and amplify gender bias, particularly in socially sensitive applications. Existing benchmarks predominantly evaluate bias in isolated scenarios, overlooking how bias may emerge subtly through interpersonal interactions. We fill this gap by going beyond single-entity evaluation and instead focusing on a deeper examination of relational and contextual gender bias in dual-individual interactions. We introduce Genres, a novel benchmark designed to evaluate gender bias in MLLMs through the lens of social relationships in generated narratives. Genres assesses gender bias through a dual-character profile and narrative generation task that captures rich interpersonal dynamics and supports a fine-grained bias evaluation suite across multiple dimensions. Experiments on both open- and closed-source MLLMs reveal persistent, context-sensitive gender biases that are not evident in single-character settings. Our findings underscore the importance of relationship-aware benchmarks for diagnosing subtle, interaction-driven gender bias in MLLMs and provide actionable insights for future bias mitigation.
- Media (1.00)
- Health & Medicine (1.00)
- Leisure & Entertainment (0.93)
- Government (0.67)
From Model to Classroom: Evaluating Generated MCQs for Portuguese with Narrative and Difficulty Concerns
Leite, Bernardo, Cardoso, Henrique Lopes, Pinto, Pedro, Ferreira, Abel, Abreu, Luís, Rangel, Isabel, Monteiro, Sandra
While MCQs are valuable for learning and evaluation, manually creating them with varying difficulty levels and targeted reading skills remains a time-consuming and costly task. Recent advances in generative AI provide an opportunity to automate MCQ generation efficiently. However, assessing the actual quality and reliability of generated MCQs has received limited attention -- particularly regarding cases where generation fails. This aspect becomes particularly important when the generated MCQs are meant to be applied in real-world settings. Additionally, most MCQ generation studies focus on English, leaving other languages underexplored. This paper investigates the capabilities of current generative models in producing MCQs for reading comprehension in Portuguese, a morphologically rich language. Our study focuses on generating MCQs that align with curriculum-relevant narrative elements and span different difficulty levels. We evaluate these MCQs through expert review and by analyzing the psychometric properties extracted from student responses to assess their suitability for elementary school students. Our results show that current models can generate MCQs of comparable quality to human-authored ones. However, we identify issues related to semantic clarity and answerability. Also, challenges remain in generating distractors that engage students and meet established criteria for high-quality MCQ option design.
- Europe > Portugal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Education > Assessment & Standards > Student Performance (0.67)
- Education > Educational Setting > K-12 Education (0.66)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
A Decomposition-Based Approach for Evaluating and Analyzing Inter-Annotator Disagreement
We propose a novel method to conceptually decompose an existing annotation into separate levels, allowing the analysis of inter-annotators disagreement in each level separately. We suggest two distinct strategies in order to actualize this approach: a theoretically-driven one, in which the researcher defines a decomposition based on prior knowledge of the annotation task, and an exploration-based one, in which many possible decompositions are inductively computed and presented to the researcher for interpretation and evaluation. Utilizing a recently constructed dataset for narrative analysis as our use-case, we apply each of the two strategies to demonstrate the potential of our approach in testing hypotheses regarding the sources of annotation disagreements, as well as revealing latent structures and relations within the annotation task. We conclude by suggesting how to extend and generalize our approach, as well as use it for other purposes.
- North America > United States (0.28)
- Europe (0.28)
Advancing Question Generation with Joint Narrative and Difficulty Control
Leite, Bernardo, Cardoso, Henrique Lopes
Question Generation (QG), the task of automatically generating questions from a source input, has seen significant progress in recent years. Difficulty-controllable QG (DCQG) enables control over the difficulty level of generated questions while considering the learner's ability. Additionally, narrative-controllable QG (NCQG) allows control over the narrative aspects embedded in the questions. However, research in QG lacks a focus on combining these two types of control, which is important for generating questions tailored to educational purposes. To address this gap, we propose a strategy for Joint Narrative and Difficulty Control, enabling simultaneous control over these two attributes in the generation of reading comprehension questions. Our evaluation provides preliminary evidence that this approach is feasible, though it is not effective across all instances. Our findings highlight the conditions under which the strategy performs well and discuss the trade-offs associated with its application.
- Europe (0.93)
- North America > United States > Minnesota (0.28)
HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs
Shen, Jocelyn, Mire, Joel, Park, Hae Won, Breazeal, Cynthia, Sap, Maarten
Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and quantify this relationship between style and empathy using LLMs and large-scale crowdsourcing studies. We introduce a novel, theory-based taxonomy, HEART (Human Empathy and Narrative Taxonomy) that delineates elements of narrative style that can lead to empathy with the narrator of a story. We establish the performance of LLMs in extracting narrative elements from HEART, showing that prompting with our taxonomy leads to reasonable, human-level annotations beyond what prior lexicon-based methods can do. To show empirical use of our taxonomy, we collect a dataset of empathy judgments of stories via a large-scale crowdsourcing study with N=2,624 participants. We show that narrative elements extracted via LLMs, in particular, vividness of emotions and plot volume, can elucidate the pathways by which narrative style cultivates empathy towards personal stories. Our work suggests that such models can be used for narrative analyses that lead to human-centered social and behavioral insights.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Arizona (0.04)
- (13 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Leisure & Entertainment (1.00)
- Education (0.93)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)
On Few-Shot Prompting for Controllable Question-Answer Generation in Narrative Comprehension
Leite, Bernardo, Cardoso, Henrique Lopes
Question Generation aims to automatically generate questions based on a given input provided as context. A controllable question generation scheme focuses on generating questions with specific attributes, allowing better control. In this study, we propose a few-shot prompting strategy for controlling the generation of question-answer pairs from children's narrative texts. We aim to control two attributes: the question's explicitness and underlying narrative elements. With empirical evaluation, we show the effectiveness of controlling the generation process by employing few-shot prompting side by side with a reference model. Our experiments highlight instances where the few-shot strategy surpasses the reference model, particularly in scenarios such as semantic closeness evaluation and the diversity and coherency of question-answer pairs. However, these improvements are not always statistically significant. The code is publicly available at github.com/bernardoleite/few-shot-prompting-qg-control.
- Europe > United Kingdom > Scotland (0.04)
- Europe > Switzerland (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (4 more...)
Framing Analysis of Health-Related Narratives: Conspiracy versus Mainstream Media
Reiter-Haas, Markus, Klösch, Beate, Hadler, Markus, Lex, Elisabeth
Understanding how online media frame issues is crucial due to their impact on public opinion. Research on framing using natural language processing techniques mainly focuses on specific content features in messages and neglects their narrative elements. Also, the distinction between framing in different sources remains an understudied problem. We address those issues and investigate how the framing of health-related topics, such as COVID-19 and other diseases, differs between conspiracy and mainstream websites. We incorporate narrative information into the framing analysis by introducing a novel frame extraction approach based on semantic graphs. We find that health-related narratives in conspiracy media are predominantly framed in terms of beliefs, while mainstream media tend to present them in terms of science. We hope our work offers new ways for a more nuanced frame analysis.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Iran (0.04)
- (19 more...)
ARN: A Comprehensive Framework and Benchmark for Analogical Reasoning on Narratives
Sourati, Zhivar, Ilievski, Filip, Sommerauer, Pia, Jiang, Yifan
Analogical reasoning is one of the prime abilities of humans and is linked to creativity and scientific discoveries. This ability has been studied extensively in natural language processing (NLP) and in cognitive psychology. NLP benchmarks often focus on proportional analogies, while the ones in cognitive psychology investigate longer pieces of text too. Yet, although studies that focus on analogical reasoning in an involved setting utilize narratives as their evaluation medium, analogical reasoning on narratives has not been studied extensively. We create an extensive evaluation framework for analogical reasoning on narratives that utilizes narrative elements to create lower-order and higher-order mappings that subsequently lead to the development of the Analogical Reasoning on Narratives (ARN) benchmark that covers four categories of far(cross-domain)/near(within-domain) analogies and far/near disanalogies, allowing us to study analogical reasoning in LLMs in distinct scenarios. Our results demonstrate that LLMs struggle to recognize higher-order mappings when they are not accompanied by lower-order mappings (far analogies) and show better performance when all mappings are formed simultaneously (near analogies). We observe that in all the scenarios, the analogical reasoning abilities of LLMs can be easily impaired by lower-order mappings in near disanalogies.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (13 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Analogical Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
CompRes: A Dataset for Narrative Structure in News
Levi, Effi, Mor, Guy, Shenhav, Shaul, Sheafer, Tamir
This paper addresses the task of automatically detecting narrative structures in raw texts. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to news articles, motivated by their growing social impact as well as their role in creating and shaping public opinion. We introduce CompRes -- the first dataset for narrative structure in news media. We describe the process in which the dataset was constructed: first, we designed a new narrative annotation scheme, better suited for news media, by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success); then, we used that scheme to annotate a set of 29 English news articles (containing 1,099 sentences) collected from news and partisan websites. We use the annotated dataset to train several supervised models to identify the different narrative elements, achieving an $F_1$ score of up to 0.7. We conclude by suggesting several promising directions for future work.
- Asia > South Korea (0.29)
- North America > United States > Utah (0.04)
- North America > United States > Texas (0.04)
- (8 more...)
- Government (1.00)
- Media > News (0.70)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
- Health & Medicine > Therapeutic Area > Immunology (0.68)
Towards Enriched Controllability for Educational Question Generation
Leite, Bernardo, Cardoso, Henrique Lopes
Question Generation (QG) is a task within Natural Language Processing (NLP) that involves automatically generating questions given an input, typically composed of a text and a target answer. Recent work on QG aims to control the type of generated questions so that they meet educational needs. A remarkable example of controllability in educational QG is the generation of questions underlying certain narrative elements, e.g., causal relationship, outcome resolution, or prediction. This study aims to enrich controllability in QG by introducing a new guidance attribute: question explicitness. We propose to control the generation of explicit and implicit (wh)-questions from childrenfriendly stories. We show preliminary evidence of controlling QG via question explicitness alone and simultaneously with another target attribute: the question's narrative element.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- Europe > Portugal > Porto > Porto (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)