Goto

Collaborating Authors

 punchline


Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models

Dogra, Atharvan, Ghosal, Soumya Suvra, Deshpande, Ameet, Kalyan, Ashwin, Manocha, Dinesh

arXiv.org Artificial Intelligence

Large language models are increasingly used for creative writing and engagement content, raising safety concerns about the outputs. Therefore, casting humor generation as a testbed, this work evaluates how funniness optimization in modern LLM pipelines couples with harmful content by jointly measuring humor, stereotypicality, and toxicity. This is further supplemented by analyzing incongruity signals through information-theoretic metrics. Across six models, we observe that harmful outputs receive higher humor scores which further increase under role-based prompting, indicating a bias amplification loop between generators and evaluators. Information-theoretic analyses show harmful cues widen predictive uncertainty and surprisingly, can even make harmful punchlines more expected for some models, suggesting structural embedding in learned humor distributions. External validation on an additional satire-generation task with human perceived funniness judgments shows that LLM satire increases stereotypicality and typically toxicity, including for closed models. Quantitatively, stereotypical/toxic jokes gain $10-21\%$ in mean humor score, stereotypical jokes appear $11\%$ to $28\%$ more often among the jokes marked funny by LLM-based metric and up to $10\%$ more often in generations perceived as funny by humans.


Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics

Ryan, Yuriel, Tan, Rui Yang, Choo, Kenny Tsu Wei, Lee, Roy Ka-Wei

arXiv.org Artificial Intelligence

Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.


SNL legend explains how short attention spans are having a direct impact on comedy

FOX News

Legendary comedian and actor Kevin Nealon performed on "Saturday Night Live" for almost a decade, acting in some of the series' most iconic sketches. After 40 years in the business, he recently spoke with Fox News Digital about the current state of stand-up comedy and where he feels the industry is headed. Though the medium has evolved into something bigger than ever before, Nealon described the attention spans of modern comedy audiences as much shorter -- something that those involved in the business of humor have had to cater to. "When I started comedy, it was totally different. And it was a totally different time and generation. And it was not as much short attention span. Like, I look back at some of the sketches on'SNL,' and they're a lot longer than they are now because of the short attention span, and a lot of people don't watch'SNL' at that time. They watch it on YouTube, snippets of it," said the comedian, pointing to social media as something that's gotten hundreds of millions of people accustomed to consuming content in short clips and blurbs.


PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Ouyang, Kun, Liu, Yuanxin, Li, Shicheng, Liu, Yi, Zhou, Hao, Meng, Fandong, Zhou, Jie, Sun, Xu

arXiv.org Artificial Intelligence

Multimodal punchlines, which involve humor or sarcasm conveyed in image-caption pairs, are a popular way of communication on online multimedia platforms. With the rapid development of multimodal large language models (MLLMs), it is essential to assess their ability to effectively comprehend these punchlines. However, existing benchmarks on punchline comprehension suffer from three major limitations: 1) language shortcuts that allow models to solely rely on text, 2) lack of question diversity, and 3) narrow focus on a specific domain of multimodal content (e.g., cartoon). To address these limitations, we introduce a multimodal \textbf{Punch}line comprehension \textbf{Bench}mark, named \textbf{PunchBench}, which is tailored for accurate and comprehensive evaluation of punchline comprehension. To enhance the evaluation accuracy, we generate synonymous and antonymous captions by modifying original captions, which mitigates the impact of shortcuts in the captions. To provide a comprehensive evaluation, PunchBench incorporates diverse question formats and image-captions from various domains. On this basis, we conduct extensive evaluations and reveal a significant gap between state-of-the-art MLLMs and humans in punchline comprehension. To improve punchline comprehension, we propose Simple-to-Complex Chain-of-Question (SC-CoQ) strategy, enabling the models to incrementally address complicated questions by first mastering simple ones. SC-CoQ effectively enhances the performance of various MLLMs on PunchBench, surpassing in-context learning and chain-of-thought.


DuanzAI: Slang-Enhanced LLM with Prompt for Humor Understanding

Rohn, Yesian

arXiv.org Artificial Intelligence

Language's complexity is evident in the rich tapestry of slang expressions, often laden with humor and cultural nuances. This linguistic phenomenon has become increasingly prevalent, especially in digital communication. However, existing AI models, including ChatGPT-3.5, face challenges in comprehending these nuances, particularly in Chinese slang. In this study, we present DuanzAI, an innovative approach enhancing Large Language Models (LLMs) with deep Chinese slang comprehension. Leveraging curated datasets and advanced techniques, DuanzAI bridges the gap between human expression and AI comprehension, enabling contextually relevant responses. Our experiments contrast LLMs' performance with a custom Punchline Entity Recognition (PER) system, integrating phonetic matching and pinyin2hanzi techniques. Applying these insights, we developed ChatDAI, an advanced chatbot and released our code at \url{https://github.com/YesianRohn/DuanzAI}.


Belief revision and incongruity: is it a joke?

Bannay, Florence Dupin de Saint Cyr -, Prade, Henri

arXiv.org Artificial Intelligence

Even if much has been written about ingredients that trigger laughter, researchers are still far from having completely understood their interplay in the cognitive process that leads a listener to guffaw at a pun or a joke. They are even farther from a detailed analysis and modeling of the mechanisms that are at work in this process. However, in recent articles Dupin de Saint-Cyr and Prade (2020, 2022) took a first step in this direction by laying bare that a belief revision mechanism was solicited in the reception of a narrative joke. Namely the punchline, which triggers a revision, is both surprising and explains perfectly what was reported in the beginning of the joke. A similar idea has been more informally proposed in Ritchie (2002). It is quite clear that this is insufficient for characterizing a narrative joke.


Prompt to GPT-3: Step-by-Step Thinking Instructions for Humor Generation

Chen, Yuetian, Shi, Bowen, Si, Mei

arXiv.org Artificial Intelligence

Artificial intelligence has made significant progress in natural language processing, with models like GPT-3 demonstrating impressive capabilities. However, these models still have limitations when it comes to complex tasks that require an understanding of the user, such as mastering human comedy writing strategies. This paper explores humor generation using GPT-3 by modeling human comedy writing theory and leveraging step-by-step thinking instructions. In addition, we explore the role of cognitive distance in creating humor.


World's most advanced humanoid robot attempts to tell a joke - so do YOU understand it?

Daily Mail - Science & tech

From computer programmers to lawyers, several jobs are already at risk of being taken by artificial intelligence (AI). But if you're a comedian, you can rest easy for now, if the latest robotic demonstration is anything to go by. Ameca, the'world's most advanced humanoid robot', attempts to tell a joke in a new video - and miserably fails. While Ameca's facial expressions are undeniably lifelike, her'joke' lacks any kind of punchline. However, one viewer was still impressed, joking: 'The lack of punchline was actually funny.'


ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

Jentzsch, Sophie, Kersting, Kristian

arXiv.org Artificial Intelligence

Humor is a central aspect of human communication that has not been solved for artificial agents so far. Large language models (LLMs) are increasingly able to capture implicit and contextual information. Especially, OpenAI's ChatGPT recently gained immense public attention. The GPT3-based model almost seems to communicate on a human level and can even tell jokes. Humor is an essential component of human communication. But is ChatGPT really funny? We put ChatGPT's sense of humor to the test. In a series of exploratory experiments around jokes, i.e., generation, explanation, and detection, we seek to understand ChatGPT's capability to grasp and reproduce human humor. Since the model itself is not accessible, we applied prompt-based experiments. Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model. Over 90% of 1008 generated jokes were the same 25 Jokes. The system accurately explains valid jokes but also comes up with fictional explanations for invalid jokes. Joke-typical characteristics can mislead ChatGPT in the classification of jokes. ChatGPT has not solved computational humor yet but it can be a big leap toward "funny" machines.


GPT-4 is surprisingly good at explaining jokes

#artificialintelligence

Explaining a joke, as E.B. White once wrote, is like dissecting a frog: "the thing dies in the process and the innards are discouraging to any but the purely scientific mind." In fact, the large language model -- released on March 14 by OpenAI -- is surprisingly good at generating detailed explanations of why a joke is funny. And like its predecessor, ChatGPT, the AI can also generate jokes, though its go-to one-liners are simple and seem to have been scraped from the internet's corniest, punniest corners (Why don't scientists trust atoms? Because they make up everything!). GPT-4 seems better at explaining humor than its predecessor.