Kougia, Vasiliki
Analysing zero-shot temporal relation extraction on clinical notes using temporal consistency
Kougia, Vasiliki, Sedova, Anastasiia, Stephan, Andreas, Zaporojets, Klim, Roth, Benjamin
This paper presents the first study for temporal relation extraction in a zero-shot setting focusing on biomedical text. We employ two types of prompts and five LLMs (GPT-3.5, Mixtral, Llama 2, Gemma, and PMC-LLaMA) to obtain responses about the temporal relations between two events. Our experiments demonstrate that LLMs struggle in the zero-shot setting performing worse than fine-tuned specialized models in terms of F1 score, showing that this is a challenging task for LLMs. We further contribute a novel comprehensive temporal analysis by calculating consistency scores for each LLM. Our findings reveal that LLMs face challenges in providing responses consistent to the temporal properties of uniqueness and transitivity. Figure 1: An example of three event pairs annotated Moreover, we study the relation between the with temporal relations. In the right part, the order of temporal consistency of an LLM and its accuracy the events with respect to time (t) is shown and the and whether the latter can be improved by consistency of uniqueness and transitivity.
Exploring prompts to elicit memorization in masked language model-based named entity recognition
Xia, Yuxi, Sedova, Anastasiia, de Araujo, Pedro Henrique Luz, Kougia, Vasiliki, Nuรbaumer, Lisa, Roth, Benjamin
This paper focuses on analyzing prompts' impact on detecting the memorization of 6 masked language model-based named entity recognition models. Specifically, we employ a diverse set of 400 automatically generated prompts, and a pairwise dataset where each pair consists of one person's name from the training set and another name out of the set. A prompt completed with a person's name serves as input for getting the model's confidence in predicting this name. Finally, the prompt performance of detecting model memorization is quantified by the percentage of name pairs for which the model has higher confidence for the name from the training set. We show that the performance of different prompts varies by as much as 16 percentage points on the same model, and prompt engineering further increases the gap. Moreover, our experiments demonstrate that prompt performance is model-dependent but does generalize across different name sets. A comprehensive analysis indicates how prompt performance is influenced by prompt properties, contained tokens, and the model's self-attention weights on the prompt.
MemeGraphs: Linking Memes to Knowledge Graphs
Kougia, Vasiliki, Fetzel, Simon, Kirchmair, Thomas, รano, Erion, Baharlou, Sina Moayed, Sharifzadeh, Sahand, Roth, Benjamin
Memes are a popular form of communicating trends and ideas in social media and on the internet in general, combining the modalities of images and text. They can express humor and sarcasm but can also have offensive content. Analyzing and classifying memes automatically is challenging since their interpretation relies on the understanding of visual elements, language, and background knowledge. Thus, it is important to meaningfully represent these sources and the interaction between them in order to classify a meme as a whole. In this work, we propose to use scene graphs, that express images in terms of objects and their visual relations, and knowledge graphs as structured representations for meme classification with a Transformer-based architecture. We compare our approach with ImgBERT, a multimodal model that uses only learned (instead of structured) representations of the meme, and observe consistent improvements. We further provide a dataset with human graph annotations that we compare to automatically generated graphs and entity linking. Analysis shows that automatic methods link more entities than human annotators and that automatically generated graphs are better suited for hatefulness classification in memes.
A Survey on Biomedical Image Captioning
Kougia, Vasiliki, Pavlopoulos, John, Androutsopoulos, Ion
Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians. This article is the first survey of biomedical image captioning, discussing datasets, evaluation measures, and state of the art methods. Additionally, we suggest two baselines, a weak and a stronger one; the latter outperforms all current state of the art systems on one of the datasets.