Goto

Collaborating Authors

 Leeuwarden



Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects

Gao, Xiyuan, Nayak, Shekhar, Coler, Matt

arXiv.org Artificial Intelligence

Sarcasm, a common feature of human communication, poses challenges in interpersonal interactions and human-machine interactions. Linguistic research has highlighted the importance of prosodic cues, such as variations in pitch, speaking rate, and intonation, in conveying sarcastic intent. Although previous work has focused on text-based sarcasm detection, the role of speech data in recognizing sarcasm has been underexplored. Recent advancements in speech technology emphasize the growing importance of leveraging speech data for automatic sarcasm recognition, which can enhance social interactions for individuals with neurodegenerative conditions and improve machine understanding of complex human language use, leading to more nuanced interactions. This systematic review is the first to focus on speech-based sarcasm recognition, charting the evolution from unimodal to multimodal approaches. It covers datasets, feature extraction, and classification methods, and aims to bridge gaps across diverse research domains. The findings include limitations in datasets for sarcasm recognition in speech, the evolution of feature extraction techniques from traditional acoustic features to deep learning-based representations, and the progression of classification methods from unimodal approaches to multimodal fusion techniques. In so doing, we identify the need for greater emphasis on cross-cultural and multilingual sarcasm recognition, as well as the importance of addressing sarcasm as a multimodal phenomenon, rather than a text-based challenge.



Inferring Adjective Hypernyms with Language Models to Increase the Connectivity of Open English Wordnet

Augello, Lorenzo, McCrae, John P.

arXiv.org Artificial Intelligence

Open English Wordnet is a key resource published in OntoLex-lemon as part of the linguistic linked open data cloud. There are, however, many links missing in the resource, and in this paper, we look at how we can establish hypernymy between adjectives. We present a theoretical discussion of the hypernymy relation and how it differs for adjectives in contrast to nouns and verbs. We develop a new resource for adjective hypernymy and fine-tune large language models to predict adjective hypernymy, showing that the methodology of TaxoLLaMa can be adapted to this task.


AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation

Gao, Xiyuan, Bansal, Shubhi, Gowda, Kushaan, Li, Zhu, Nayak, Shekhar, Kumar, Nagendra, Coler, Matt

arXiv.org Artificial Intelligence

Detecting sarcasm effectively requires a nuanced understanding of context, including vocal tones and facial expressions. The progression towards multimodal computational methods in sarcasm detection, however, faces challenges due to the scarcity of data. To address this, we present AMuSeD (Attentive deep neural network for MUltimodal Sarcasm dEtection incorporating bi-modal Data augmentation). This approach utilizes the Multimodal Sarcasm Detection Dataset (MUStARD) and introduces a two-phase bimodal data augmentation strategy. The first phase involves generating varied text samples through Back Translation from several secondary languages. The second phase involves the refinement of a FastSpeech 2-based speech synthesis system, tailored specifically for sarcasm to retain sarcastic intonations. Alongside a cloud-based Text-to-Speech (TTS) service, this Fine-tuned FastSpeech 2 system produces corresponding audio for the text augmentations. We also investigate various attention mechanisms for effectively merging text and audio data, finding self-attention to be the most efficient for bimodal integration. Our experiments reveal that this combined augmentation and attention approach achieves a significant F1-score of 81.0% in text-audio modalities, surpassing even models that use three modalities from the MUStARD dataset.


Explainable Contextual Anomaly Detection using Quantile Regression Forests

Li, Zhong, van Leeuwen, Matthijs

arXiv.org Artificial Intelligence

Chandola et al (2009) subdivided anomalies into three types: point anomalies (an object is considered anomalous when compared against the rest of objects), contextual anomalies (an object is anomalous in a specific context), and collective anomalies (a collection of objects is anomalous with respect to the entire dataset). The analysis of anomalies has a wide range of applications, such as in network security (Ahmed et al, 2016a), bioinformatics (Spinosa and Carvalho, 2005), fraud detection (Ahmed et al, 2016b), and fault detection and isolation (Hwang et al, 2009). Anomaly analysis consists of two equally important tasks: anomaly detection and anomaly explanation. A wealth of'shallow' machine learning based methods, i.e., not based on deep learning, have been proposed to detect anomalies (Chandola et al, 2009). More recently, many deep learning based anomaly detection methods have also been developed (Pang et al, 2021). However, deep learning based anomaly detection methods are notoriously known as not being interpretable, in the sense that generally both the model itself is non-transparent and the resulting anomaly scores are challenging to interpret without the use of a post-hoc explainer.


Improving Toponym Resolution with Better Candidate Generation, Transformer-based Reranking, and Two-Stage Resolution

Zhang, Zeyu, Bethard, Steven

arXiv.org Artificial Intelligence

Geocoding is the task of converting location mentions in text into structured data that encodes the geospatial semantics. We propose a new architecture for geocoding, GeoNorm. GeoNorm first uses information retrieval techniques to generate a list of candidate entries from the geospatial ontology. Then it reranks the candidate entries using a transformer-based neural network that incorporates information from the ontology such as the entry's population. This generate-and-rerank process is applied twice: first to resolve the less ambiguous countries, states, and counties, and second to resolve the remaining location mentions, using the identified countries, states, and counties as context. Our proposed toponym resolution framework achieves state-of-the-art performance on multiple datasets. Code and models are available at \url{https://github.com/clulab/geonorm}.


ADDSL: Hand Gesture Detection and Sign Language Recognition on Annotated Danish Sign Language

Jain, Sanyam

arXiv.org Artificial Intelligence

For a long time, detecting hand gestures and recognizing them as letters or numbers has been a challenging task. This creates communication barriers for individuals with disabilities. This paper introduces a new dataset, the Annotated Dataset for Danish Sign Language (ADDSL). Annota-tions for the dataset were made using the open-source tool LabelImg in the YOLO format. Using this dataset, a one-stage ob-ject detector model (YOLOv5) was trained with the CSP-DarkNet53 backbone and YOLOv3 head to recognize letters (A-Z) and numbers (0-9) using only seven unique images per class (without augmen-tation). Five models were trained with 350 epochs, resulting in an average inference time of 9.02ms per image and a best accu-racy of 92% when compared to previous research. Our results show that modified model is efficient and more accurate than existing work in the same field. The code repository for our model is available at the GitHub repository https://github.com/s4nyam/pvt-addsl.


An Evaluation on Large Language Model Outputs: Discourse and Memorization

de Wynter, Adrian, Wang, Xun, Sokolov, Alex, Gu, Qilong, Chen, Si-Qing

arXiv.org Artificial Intelligence

We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available tools. We find a correlation between percentage of memorized text, percentage of unique text, and overall output quality, when measured with respect to output pathologies such as counterfactual and logically-flawed statements, and general failures like not staying on topic. Overall, 80.0% of the outputs evaluated contained memorized data, but outputs containing the most memorized content were also more likely to be considered of high quality. We discuss and evaluate mitigation strategies, showing that, in the models evaluated, the rate of memorized text being output is reduced. We conclude with a discussion on potential implications around what it means to learn, to memorize, and to evaluate quality text.


A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients

Izdebski, Adam, Thoral, Patrick J, Lalisang, Robbert C A, McHugh, Dean M, Entjes, Robert, van der Meer, Nardo J M, Dongelmans, Dave A, Boelens, Age D, Rigter, Sander, Hendriks, Stefaan H A, de Jong, Remko, Kamps, Marlijn J A, Peters, Marco, Karakus, A, Gommers, Diederik, Ramnarain, Dharmanand, Wils, Evert-Jan, Achterberg, Sefanja, Nowitzky, Ralph, Tempel, Walter van den, de Jager, Cornelis P C, Nooteboom, Fleur G C A, Oostdijk, Evelien, Koetsier, Peter, Cornet, Alexander D, Reidinga, Auke C, de Ruijter, Wouter, Bosman, Rob J, Frenzel, Tim, Urlings-Strop, Louise C, de Jong, Paul, Smit, Ellen G M, Cremer, Olaf L, van Osch, Frits H M, Faber, Harald J, Lens, Judith, Brunnekreef, Gert B, Festen-Spanjer, Barbara, Dormans, Tom, Simons, Bram, Rijkeboer, A A, Dijkstra, Annemieke, Arbous, Sesmu, Aries, Marcel, Beukema, Menno, van Raalte, Rutger, van Tellingen, Martijn, Oever, Niels C Gritters van den, Elbers, Paul W G, Cinà, Giovanni

arXiv.org Artificial Intelligence

Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article showcases a pragmatic methodology to obtain preliminary estimation of treatment effect from observational studies. Our approach was tested on the estimation of treatment effect of the proning maneuver on oxygenation levels, on a cohort of COVID-19 Intensive Care patients. We modeled our study design on a recent RCT for proning (the PROSEVA trial). Linear regression, propensity score models such as blocking and DR-IPW, BART and two versions of Counterfactual Regression were employed to provide estimates on observational data comprising first wave COVID-19 ICU patient data from 25 Dutch hospitals. 6371 data points, from 745 mechanically ventilated patients, were included in the study. Estimates for the early effect of proning -- P/F ratio from 2 to 8 hours after proning -- ranged between 14.54 and 20.11 mm Hg depending on the model. Estimates for the late effect of proning -- oxygenation from 12 to 24 hours after proning -- ranged between 13.53 and 15.26 mm Hg. All confidence interval being strictly above zero indicated that the effect of proning on oxygenation for COVID-19 patient was positive and comparable in magnitude to the effect on non COVID-19 patients. These results provide further evidence on the effectiveness of proning on the treatment of COVID-19 patients. This study, along with the accompanying open-source code, provides a blueprint for treatment effect estimation in scenarios where RCT data is lacking. Funding: SIDN fund, CovidPredict consortium, Pacmed.