Goto

Collaborating Authors

 emoticon


EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

Zhou, Xinyun, Li, Xinfeng, Peng, Yinan, Xu, Ming, Zhang, Xuanwang, Yu, Miao, Wang, Yidong, Jia, Xiaojun, Wang, Kun, Wen, Qingsong, Wang, XiaoFeng, Dong, Wei

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study unveils a critical, overlooked vulnerability: their profound susceptibility to subtle symbolic perturbations, particularly through near-imperceptible emoticon tokens such as "(@_@)" that can catastrophically mislead retrieval, termed EmoRAG. We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts that contain a matching emoticon. Our extensive experiment across general question-answering and code domains, using a range of state-of-the-art retrievers and generators, reveals three key findings: (I) Single-Emoticon Disaster: Minimal emoticon injections cause maximal disruptions, with a single emoticon almost 100% dominating RAG output. (II) Positional Sensitivity: Placing an emoticon at the beginning of a query can cause severe perturbation, with F1-Scores exceeding 0.92 across all datasets. (III) Parameter-Scale Vulnerability: Counterintuitively, models with larger parameters exhibit greater vulnerability to the interference. We provide an in-depth analysis to uncover the underlying mechanisms of these phenomena. Furthermore, we raise a critical concern regarding the robustness assumption of current RAG systems, envisioning a threat scenario where an adversary exploits this vulnerability to manipulate the RAG system. We evaluate standard defenses and find them insufficient against EmoRAG. To address this, we propose targeted defenses, analyzing their strengths and limitations in mitigating emoticon-based perturbations. Finally, we outline future directions for building robust RAG systems.


A Computer Science Professor Invented the Emoticon After a Joke Went Wrong

WIRED

In 1982, Carnegie Mellon University professor Scott Fahlman suggested using:-) for humorous comments after his colleagues took a joke about mercury seriously. On September 19, 1982, Carnegie Mellon University computer science research assistant professor Scott Fahlman posted a message to the university's bulletin board software that would later come to shape how people communicate online. His proposal: use:-) and:-( as markers to distinguish jokes from serious comments. While Fahlman describes himself as "the inventor or at least one of the inventors" of what would later be called the smiley face emoticon, the full story reveals something more interesting than a lone genius moment. The whole episode started three days earlier when computer scientist Neil Swartz posed a physics problem to colleagues on Carnegie Mellon's "bboard," which was an early online message board.


emoji-development-face-tears-joy-book-keith-houston.html?via=rss

Slate

A couple of years ago, I frequently found myself driving past a roadside ice cream stand under construction. For weeks, the roof of this stand, a gigantic white swirl of fiberglass soft serve, sat on the ground next to the structure, waiting to be lowered onto the finished, cone-shaped building with a crane. I know what it was supposed to represent, but every time I glimpsed it, my instinctive first thought was There's a giant poop emoji. Keith Houston's history of emoji, Face With Tears of Joy, argues that emoji have "become so ubiquitous in our writing, so quotidian, that we should be talking about them in the same breath as grammar or punctuation." I don't know about grammar, which seems as fundamental to language, spoken and written, as words themselves.


Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs

Pandey, Ananya, Vishwakarma, Dinesh Kumar

arXiv.org Artificial Intelligence

The emoticons are symbolic representations that generally accompany the textual content to visually enhance or summarize the true intention of a written message. Although widely utilized in the realm of social media, the core semantics of these emoticons have not been extensively explored based on multiple modalities. Incorporating textual and visual information within a single message develops an advanced way of conveying information. Hence, this research aims to analyze the relationship among sentences, visuals, and emoticons. For an orderly exposition, this paper initially provides a detailed examination of the various techniques for extracting multimodal features, emphasizing the pros and cons of each method. Through conducting a comprehensive examination of several multimodal algorithms, with specific emphasis on the fusion approaches, we have proposed a novel contrastive learning based multimodal architecture. The proposed model employs the joint training of dual-branch encoder along with the contrastive learning to accurately map text and images into a common latent space. Our key finding is that by integrating the principle of contrastive learning with that of the other two branches yields superior results. The experimental results demonstrate that our suggested methodology surpasses existing multimodal approaches in terms of accuracy and robustness. The proposed model attained an accuracy of 91% and an MCC-score of 90% while assessing emoticons using the Multimodal-Twitter Emoticon dataset acquired from Twitter. We provide evidence that deep features acquired by contrastive learning are more efficient, suggesting that the proposed fusion technique also possesses strong generalisation capabilities for recognising emoticons across several modes.


Analyzing Gender Polarity in Short Social Media Texts with BERT: The Role of Emojis and Emoticons

Jazi, Saba Yousefian, Mirzaeinia, Amir, Jazi, Sina Yousefian

arXiv.org Artificial Intelligence

In this effort we fine tuned different models based on BERT to detect the gender polarity of twitter accounts. We specially focused on analyzing the effect of using emojis and emoticons in performance of our model in classifying task. We were able to demonstrate that the use of these none word inputs alongside the mention of other accounts in a short text format like tweet has an impact in detecting the account holder's gender.


Americans can finally understand British humour! Scientists develop a device that can detect when someone is being sarcastic

Daily Mail - Science & tech

Our friends from across the pond have been known to struggle with British sarcasm on occasion. But improved Anglo-American relations may be on the horizon, as experts have developed a device that can detect when someone is being sarcastic. A team from the University of Groningen have created an algorithm that analyses someone's speech to work out if they are using irony. It works by examining the pitch, talking rate and energy in speech, and then transcribing the speech into text for it to be analysed further for language cues. 'We extracted acoustic parameters such as pitch, speaking rate, and energy from speech, then used Automatic Speech Recognition to transcribe the speech into text for sentiment analysis,' author Xiyuan Gao said.


PMG : Personalized Multimodal Generation with Large Language Models

Shen, Xiaoteng, Zhang, Rui, Zhao, Xiaoyan, Zhu, Jieming, Xiao, Xi

arXiv.org Artificial Intelligence

The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation using LLMs, showcases its applications and validates its performance via an extensive experimental study on two datasets. The proposed method, Personalized Multimodal Generation (PMG for short) first converts user behaviors (e.g., clicks in recommender systems or conversations with a virtual assistant) into natural language to facilitate LLM understanding and extract user preference descriptions. Such user preferences are then fed into a generator, such as a multimodal LLM or diffusion model, to produce personalized content. To capture user preferences comprehensively and accurately, we propose to let the LLM output a combination of explicit keywords and implicit embeddings to represent user preferences. Then the combination of keywords and embeddings are used as prompts to condition the generator. We optimize a weighted sum of the accuracy and preference scores so that the generated content has a good balance between them. Compared to a baseline method without personalization, PMG has a significant improvement on personalization for up to 8% in terms of LPIPS while retaining the accuracy of generation.


The emojification of sentiment on social media: Collection and analysis of a longitudinal Twitter sentiment dataset

Yin, Wenjie, Alkhalifa, Rabab, Zubiaga, Arkaitz

arXiv.org Artificial Intelligence

Social media, as a means for computer-mediated communication, has been extensively used to study the sentiment expressed by users around events or topics. There is however a gap in the longitudinal study of how sentiment evolved in social media over the years. To fill this gap, we develop TM-Senti, a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets and covering a time period of over seven years. We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset, along with an analysis of the resulting dataset. Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons. We publicly release the dataset for further research in tasks including sentiment analysis and text classification of tweets. The dataset can be fully rehydrated including tweet metadata and without missing tweets thanks to the archive of tweets publicly available on the Internet Archive, which the dataset is based on.


Emotion Recognition from Microblog Managing Emoticon with Text and Classifying using 1D CNN

Habib, Md. Ahsan, Akhand, M. A. H., Kamal, Md. Abdus Samad

arXiv.org Artificial Intelligence

Microblog, an online-based broadcast medium, is a widely used forum for people to share their thoughts and opinions. Recently, Emotion Recognition (ER) from microblogs is an inspiring research topic in diverse areas. In the machine learning domain, automatic emotion recognition from microblogs is a challenging task, especially, for better outcomes considering diverse content. Emoticon becomes very common in the text of microblogs as it reinforces the meaning of content. This study proposes an emotion recognition scheme considering both the texts and emoticons from microblog data. Emoticons are considered unique expressions of the users' emotions and can be changed by the proper emotional words. The succession of emoticons appearing in the microblog data is preserved and a 1D Convolutional Neural Network (CNN) is employed for emotion classification. The experimental result shows that the proposed emotion recognition scheme outperforms the other existing methods while tested on Twitter data.


An LSTM model for Twitter Sentiment Analysis

Mollah, Md Parvez

arXiv.org Artificial Intelligence

Sentiment analysis on social media such as Twitter provides organizations and individuals an effective way to monitor public emotions towards them and their competitors. As a result, sentiment analysis has become an important and challenging task. In this work, we have collected seven publicly available and manually annotated twitter sentiment datasets. We create a new training and testing dataset from the collected datasets. We develop an LSTM model to classify sentiment of a tweet and evaluate the model with the new dataset.