Overview
A Survey of Multimodal Sarcasm Detection
Farabi, Shafkat, Ranasinghe, Tharindu, Kanojia, Diptesh, Kong, Yu, Zampieri, Marcos
Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on social media and other forms of computer-mediated communication motivating the use of computational models to identify it automatically. While the clear majority of approaches to sarcasm detection have been carried out on text only, sarcasm detection often requires additional information present in tonality, facial expression, and contextual images. This has led to the introduction of multimodal models, opening the possibility to detect sarcasm in multiple modalities such as audio, images, text, and video. In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date. We survey papers published between 2018 and 2023 on the topic, and discuss the models and datasets used for this task. We also present future research directions in MSD.
Can Stories Help LLMs Reason? Curating Information Space Through Narrative
Javadi, Vahid Sadiri, Trippas, Johanne R., Lal, Yash Kumar, Flek, Lucie
Narratives are widely recognized as a powerful tool for structuring information and facilitating comprehension of complex ideas in various domains such as science communication. This paper investigates whether incorporating narrative elements can assist Large Language Models (LLMs) in solving complex problems more effectively. We propose a novel approach, Story of Thought (SoT), integrating narrative structures into prompting techniques for problem-solving. This approach involves constructing narratives around problem statements and creating a framework to identify and organize relevant information. Our experiments show that using various LLMs with SoT consistently surpasses using them with other techniques on physics, chemistry, math, and biology questions in both the GPQA and JEEBench datasets. The narrative-based information curation process in SoT enhances problem comprehension by contextualizing critical in-domain information and highlighting causal relationships within the problem space.
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Ortiz-Perez, David, Benavent-Lledo, Manuel, Garcia-Rodriguez, Jose, Tomás, David, Vizcaya-Moreno, M. Flores
Cognitive decline is a natural part of aging, often resulting in reduced cognitive abilities. In some cases, however, this decline is more pronounced, typically due to disorders such as Alzheimer's disease. Early detection of anomalous cognitive decline is crucial, as it can facilitate timely professional intervention. While medical data can help in this detection, it often involves invasive procedures. An alternative approach is to employ non-intrusive techniques such as speech or handwriting analysis, which do not necessarily affect daily activities. This survey reviews the most relevant methodologies that use deep learning techniques to automate the cognitive decline estimation task, including audio, text, and visual processing. We discuss the key features and advantages of each modality and methodology, including state-of-the-art approaches like Transformer architecture and foundation models. In addition, we present works that integrate different modalities to develop multimodal models. We also highlight the most significant datasets and the quantitative results from studies using these resources. From this review, several conclusions emerge. In most cases, the textual modality achieves the best results and is the most relevant for detecting cognitive decline. Moreover, combining various approaches from individual modalities into a multimodal model consistently enhances performance across nearly all scenarios.
Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model
Hamza, Ali, Lojo, Aizea, Núñez-Marcos, Adrian, Atutxa, Aitziber
This paper introduces Ali-AUG, a novel single-step diffusion model for efficient labeled data augmentation in industrial applications. Our method addresses the challenge of limited labeled data by generating synthetic, labeled images with precise feature insertion. Ali-AUG utilizes a stable diffusion architecture enhanced with skip connections and LoRA modules to efficiently integrate masks and images, ensuring accurate feature placement without affecting unrelated image content. Experimental validation across various industrial datasets demonstrates Ali-AUG's superiority in generating high-quality, defect-enhanced images while maintaining rapid single-step inference. By offering precise control over feature insertion and minimizing required training steps, our technique significantly enhances data augmentation capabilities, providing a powerful tool for improving the performance of deep learning models in scenarios with limited labeled data. Ali-AUG is especially useful for use cases like defective product image generation to train AI-based models to improve their ability to detect defects in manufacturing processes. Using different data preparation strategies, including Classification Accuracy Score (CAS) and Naive Augmentation Score (NAS), we show that Ali-AUG improves model performance by 31% compared to other augmentation methods and by 45% compared to models without data augmentation. Notably, Ali-AUG reduces training time by 32% and supports both paired and unpaired datasets, enhancing flexibility in data preparation.
Perturbation-based Graph Active Learning for Weakly-Supervised Belief Representation Learning
Sun, Dachun, Wang, Ruijie, Li, Jinning, Han, Ruipeng, Liu, Xinyi, Lyu, You, Abdelzaher, Tarek
This paper addresses the problem of optimizing the allocation of labeling resources for semi-supervised belief representation learning in social networks. The objective is to strategically identify valuable messages on social media graphs that are worth labeling within a constrained budget, ultimately maximizing the task's performance. Despite the progress in unsupervised or semi-supervised methods in advancing belief and ideology representation learning on social networks and the remarkable efficacy of graph learning techniques, the availability of high-quality curated labeled social data can greatly benefit and further improve performances. Consequently, allocating labeling efforts is a critical research problem in scenarios where labeling resources are limited. This paper proposes a graph data augmentation-inspired perturbation-based active learning strategy (PerbALGraph) that progressively selects messages for labeling according to an automatic estimator, obviating human guidance. This estimator is based on the principle that messages in the network that exhibit heightened sensitivity to structural features of the observational data indicate landmark quality that significantly influences semi-supervision processes. We design the estimator to be the prediction variance under a set of designed graph perturbations, which is model-agnostic and application-independent. Extensive experiment results demonstrate the effectiveness of the proposed strategy for belief representation learning tasks.
A Survey of Deep Graph Learning under Distribution Shifts: from Graph Out-of-Distribution Generalization to Adaptation
Zhang, Kexin, Liu, Shuhan, Wang, Song, Shi, Weili, Chen, Chen, Li, Pan, Li, Sheng, Li, Jundong, Ding, Kaize
Distribution shifts on graphs -- the discrepancies in data distribution between training and employing a graph machine learning model -- are ubiquitous and often unavoidable in real-world scenarios. These shifts may severely deteriorate model performance, posing significant challenges for reliable graph machine learning. Consequently, there has been a surge in research on graph machine learning under distribution shifts, aiming to train models to achieve satisfactory performance on out-of-distribution (OOD) test data. In our survey, we provide an up-to-date and forward-looking review of deep graph learning under distribution shifts. Specifically, we cover three primary scenarios: graph OOD generalization, training-time graph OOD adaptation, and test-time graph OOD adaptation. We begin by formally formulating the problems and discussing various types of distribution shifts that can affect graph learning, such as covariate shifts and concept shifts. To provide a better understanding of the literature, we systematically categorize the existing models based on our proposed taxonomy and investigate the adopted techniques behind. We also summarize commonly used datasets in this research area to facilitate further investigation. Finally, we point out promising research directions and the corresponding challenges to encourage further study in this vital domain. Additionally, we provide a continuously updated reading list at https://github.com/kaize0409/Awesome-Graph-OOD.
Health Misinformation in Social Networks: A Survey of IT Approaches
Papanikou, Vasiliki, Papadakos, Panagiotis, Karamanidou, Theodora, Stavropoulos, Thanos G., Pitoura, Evaggelia, Tsaparas, Panayiotis
The spread of misinformation online, most commonly known as fake news, is an important issue that has become more pronounced in the last two decades due to the prevalence of social media. Platforms like Twitter, Reddit, and Facebook, have been commonly identified as the main channels for propagating misinformation and have been criticized for not acting on addressing the conditions that permit the circulation and amplification of false information [32]. Such misinformation includes false claims and non fact-checked news items, that originate from sources of questionable credibility [113]. The problem of misinformation becomes critical when it pertains to healthcare and health issues, since it puts lives and the public health at risk. One of the first cases of widely spread misinformation in the medical domain is the falsehood that the MMR vaccine (Measles, Mumps, Rubella) causes autism [109]. The falsehood originated from a fraudulent article titled "Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children" published in the prestigious Lancet journal in 1998 [171, 197]. This study turned tens of thousands of parents against the vaccine, and as a result, in 2020, many countries, including the United Kingdom, Greece, Venezuela, and Brazil, lost their measles elimination status. In 2020, twenty-two years after publishing this study Lancet retracted the paper [203].
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Fu, Yujuan, Uzuner, Ozlem, Yetisgen, Meliha, Xia, Fei
Large language models (LLMs) have demonstrated great performance across various benchmarks, showing potential as general-purpose task solvers. However, as LLMs are typically trained on vast amounts of data, a significant concern in their evaluation is data contamination, where overlap between training data and evaluation datasets inflates performance assessments. While multiple approaches have been developed to identify data contamination, these approaches rely on specific assumptions that may not hold universally across different settings. To bridge this gap, we systematically review 47 papers on data contamination detection, categorize the underlying assumptions, and assess whether they have been rigorously validated. We identify and analyze eight categories of assumptions and test three of them as case studies. Our analysis reveals that when classifying instances used for pretraining LLMs, detection approaches based on these three assumptions perform close to random guessing, suggesting that current LLMs learn data distributions rather than memorizing individual instances. Overall, this work underscores the importance of approaches clearly stating their underlying assumptions and testing their validity across various scenarios.
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
Jia, Chengyou, Luo, Minnan, Dang, Zhuohang, Sun, Qiushi, Xu, Fangzhi, Hu, Junlin, Xie, Tianbao, Wu, Zhiyong
Digital agents capable of automating complex computer tasks have attracted considerable attention due to their immense potential to enhance human-computer interaction. However, existing agent methods exhibit deficiencies in their generalization and specialization capabilities, especially in handling open-ended computer tasks in real-world environments. Inspired by the rich functionality of the App store, we present AgentStore, a scalable platform designed to dynamically integrate heterogeneous agents for automating computer tasks. AgentStore empowers users to integrate third-party agents, allowing the system to continuously enrich its capabilities and adapt to rapidly evolving operating systems. Additionally, we propose a novel core \textbf{MetaAgent} with the \textbf{AgentToken} strategy to efficiently manage diverse agents and utilize their specialized and generalist abilities for both domain-specific and system-wide tasks. Extensive experiments on three challenging benchmarks demonstrate that AgentStore surpasses the limitations of previous systems with narrow capabilities, particularly achieving a significant improvement from 11.21\% to 23.85\% on the OSWorld benchmark, more than doubling the previous results. Comprehensive quantitative and qualitative results further demonstrate AgentStore's ability to enhance agent systems in both generalization and specialization, underscoring its potential for developing the specialized generalist computer assistant. All our codes will be made publicly available in https://chengyou-jia.github.io/AgentStore-Home.
Learning Collusion in Episodic, Inventory-Constrained Markets
Friedrich, Paul, Pásztor, Barna, Ramponi, Giorgia
Pricing algorithms have demonstrated the capability to learn tacit collusion that is largely unaddressed by current regulations. Their increasing use in markets, including oligopolistic industries with a history of collusion, calls for closer examination by competition authorities. In this paper, we extend the study of tacit collusion in learning algorithms from basic pricing games to more complex markets characterized by perishable goods with fixed supply and sell-by dates, such as airline tickets, perishables, and hotel rooms. We formalize collusion within this framework and introduce a metric based on price levels under both the competitive (Nash) equilibrium and collusive (monopolistic) optimum. Since no analytical expressions for these price levels exist, we propose an efficient computational approach to derive them. Through experiments, we demonstrate that deep reinforcement learning agents can learn to collude in this more complex domain. Additionally, we analyze the underlying mechanisms and structures of the collusive strategies these agents adopt.