Biradar, Shankar
Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content
Kasu, Sai Kartheek Reddy, Biradar, Shankar, Saumya, Sunil
This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.
Hope Speech Detection on Social Media Platforms
Aggarwal, Pranjal, Chandana, Pasupuleti, Nemade, Jagrut, Sharma, Shubham, Saumya, Sunil, Biradar, Shankar
Since personal computers became widely available in the consumer market, the amount of harmful content on the internet has significantly expanded. In simple terms, harmful content is anything online which causes a person distress or harm. It may include hate speech, violent content, threats, non-hope speech, etc. The online content must be positive, uplifting and supportive. Over the past few years, many studies have focused on solving this problem through hate speech detection, but very few focused on identifying hope speech. This paper discusses various machine learning approaches to identify a sentence as Hope Speech, Non-Hope Speech, or a Neutral sentence. The dataset used in the study contains English YouTube comments and is released as a part of the shared task "EACL-2021: Hope Speech Detection for Equality, Diversity, and Inclusion". Initially, the dataset obtained from the shared task had three classes: Hope Speech, non-Hope speech, and not in English; however, upon deeper inspection, we discovered that dataset relabeling is required. A group of undergraduates was hired to help perform the entire dataset's relabeling task. We experimented with conventional machine learning models (such as Na\"ive Bayes, logistic regression and support vector machine) and pre-trained models (such as BERT) on relabeled data. According to the experimental results, the relabeled data has achieved a better accuracy for Hope speech identification than the original data set.