laughter
From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy
Romanowski, Adrianna, Valois, Pedro H. V., Fukui, Kazuhiro
Comedy serves as a profound reflection of the times we live in and is a staple element of human interactions. In light of the widespread adoption of Large Language Models (LLMs), the intersection of humor and AI has become no laughing matter. Advancements in the naturalness of human-computer interaction correlates with improvements in AI systems' abilities to understand humor. In this study, we assess the ability of models in accurately identifying humorous quotes from a stand-up comedy transcript. Stand-up comedy's unique comedic narratives make it an ideal dataset to improve the overall naturalness of comedic understanding. We propose a novel humor detection metric designed to evaluate LLMs amongst various prompts on their capability to extract humorous punchlines. The metric has a modular structure that offers three different scoring methods - fuzzy string matching, sentence embedding, and subspace similarity - to provide an overarching assessment of a model's performance. The model's results are compared against those of human evaluators on the same task. Our metric reveals that regardless of prompt engineering, leading models, ChatGPT, Claude, and DeepSeek, achieve scores of at most 51% in humor detection. Notably, this performance surpasses that of humans who achieve a score of 41%. The analysis of human evaluators and LLMs reveals variability in agreement, highlighting the subjectivity inherent in humor and the complexities involved in extracting humorous quotes from live performance transcripts. Code available at https://github.com/swaggirl9000/humor.
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > British Columbia (0.04)
- (6 more...)
Why Are Kids So Funny?
My daughter, Alice, is almost two, and quite funny. Although she can say short sentences--"I need cake!"--her humor isn't particularly verbal. Instead, she giggles while stumbling around in grownup shoes, or blows bubbles in her water when she should be drinking it. She likes to put on a hat, pull it down over her eyes, and then blunder around, arms outstretched, like a mummy. She's also discovered the humor of exaggeration: recently, when her brother resisted getting out of his pajamas in the morning, she sidled up, grabbed his shirt, hauled on it with both hands, and laughed while yelling, "Ooooouuuut!"
Non-Verbal Vocalisations and their Challenges: Emotion, Privacy, Sparseness, and Real Life
Batliner, Anton, Amiriparian, Shahin, Schuller, Björn W.
Non-Verbal Vocalisations (NVVs) are short `non-word' utterances without proper linguistic (semantic) meaning but conveying connotations -- be this emotions/affects or other paralinguistic information. We start this contribution with a historic sketch: how they were addressed in psychology and linguistics in the last two centuries, how they were neglected later on, and how they came to the fore with the advent of emotion research. We then give an overview of types of NVVs (formal aspects) and functions of NVVs, exemplified with the typical NVV \textit{ah}. Interesting as they are, NVVs come, however, with a bunch of challenges that should be accounted for: Privacy and general ethical considerations prevent them of being recorded in real-life (private) scenarios to a sufficient extent. Isolated, prompted (acted) exemplars do not necessarily model NVVs in context; yet, this is the preferred strategy so far when modelling NVVs, especially in AI. To overcome these problems, we argue in favour of corpus-based approaches. This guarantees a more realistic modelling; however, we are still faced with privacy and sparse data problems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Germany > Saarland (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (20 more...)
- Overview (1.00)
- Research Report (0.82)
Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents
Shen, Zhili, Diao, Chenxin, Merita, Pascual, Vougiouklis, Pavlos, Pan, Jeff Z.
Recent studies have explored graph-based approaches to retrieval-augmented generation, leveraging structured or semi-structured information -- such as entities and their relations extracted from documents -- to enhance retrieval. However, these methods are typically designed to address specific tasks, such as multi-hop question answering and query-focused summarisation, and therefore, there is limited evidence of their general applicability across broader datasets. In this paper, we aim to adapt a state-of-the-art graph-based RAG solution: $\text{GeAR}$ and explore its performance and limitations on the SIGIR 2025 LiveRAG Challenge.
- Asia > India > Andaman and Nicobar Islands (0.14)
- Europe > Italy (0.05)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.05)
- (11 more...)
- Leisure & Entertainment (1.00)
- Health & Medicine (1.00)
- Media > Film (0.96)
- Education (0.94)
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech
Borisov, Maksim, Spirin, Egor, Diatlova, Daria
Current expressive speech synthesis models are constrained by the limited availability of open-source datasets containing diverse nonverbal vocalizations (NVs). In this work, we introduce NonverbalTTS (NVTTS), a 17-hour open-access dataset annotated with 10 types of NVs (e.g., laughter, coughs) and 8 emotional categories. The dataset is derived from popular sources, VoxCeleb and Expresso, using automated detection followed by human validation. We propose a comprehensive pipeline that integrates automatic speech recognition (ASR), NV tagging, emotion classification, and a fusion algorithm to merge transcriptions from multiple annotators. Fine-tuning open-source text-to-speech (TTS) models on the NVTTS dataset achieves parity with closed-source systems such as CosyVoice2, as measured by both human evaluation and automatic metrics, including speaker similarity and NV fidelity. By releasing NVTTS and its accompanying annotation guidelines, we address a key bottleneck in expressive TTS research. The dataset is available at https://huggingface.co/datasets/deepvk/NonverbalTTS.
StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos
Barriere, Valentin, Gomez, Nahuel, Hemamou, Leo, Callejas, Sofia, Ravenet, Brian
Aiming towards improving current computational models of humor detection, we propose a new multimodal dataset of stand-up comedies, in seven languages: English, French, Spanish, Italian, Portuguese, Hungarian and Czech. Our dataset of more than 330 hours, is at the time of writing the biggest available for this type of task, and the most diverse. The whole dataset is automatically annotated in laughter (from the audience), and the subpart left for model validation is manually annotated. Contrary to contemporary approaches, we do not frame the task of humor detection as a binary sequence classification, but as word-level sequence labeling, in order to take into account all the context of the sequence and to capture the continuous joke tagging mechanism typically occurring in natural conversations. As par with unimodal baselines results, we propose a method for e propose a method to enhance the automatic laughter detection based on Audio Speech Recognition errors. Our code and data are available online: https://tinyurl.com/EMNLPHumourStandUpPublic
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- South America > Brazil (0.04)
- Europe > Spain (0.04)
- (2 more...)
Why Do We Laugh? Annotation and Taxonomy Generation for Laughable Contexts in Spontaneous Text Conversation
Inoue, Koji, Elmers, Mikey, Lala, Divesh, Kawahara, Tatsuya
Laughter serves as a multifaceted communicative signal in human interaction, yet its identification within dialogue presents a significant challenge for conversational AI systems. This study addresses this challenge by annotating laughable contexts in Japanese spontaneous text conversation data and developing a taxonomy to classify the underlying reasons for such contexts. Initially, multiple annotators manually labeled laughable contexts using a binary decision (laughable or non-laughable). Subsequently, an LLM was used to generate explanations for the binary annotations of laughable contexts, which were then categorized into a taxonomy comprising ten categories, including "Empathy and Affinity" and "Humor and Surprise," highlighting the diverse range of laughter-inducing scenarios. The study also evaluated GPT-4's performance in recognizing the majority labels of laughable contexts, achieving an F1 score of 43.14%. These findings contribute to the advancement of conversational AI by establishing a foundation for more nuanced recognition and generation of laughter, ultimately fostering more natural and engaging human-AI interactions.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- North America > United States > New York (0.04)
Reverse Prompt Engineering
This paper explores a new black-box, zero-shot language model inversion problem and proposes an innovative framework for prompt reconstruction using only text outputs from a language model. Leveraging a large language model alongside an optimization algorithm, the proposed method effectively recovers prompts with minimal resources. Experimental results on several datasets derived from public sources indicate that the proposed approach achieves high-quality prompt recovery and generates prompts more similar to the originals than current state-of-the-art methods. Additionally, the use-case study demonstrates the method's strong potential for generating high-quality text data.
- Europe > Greece (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Instructional Material (1.00)
- Research Report > Promising Solution (0.48)
- Marketing (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Government > Military (1.00)
- (6 more...)
Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems
Elmers, Mikey, Inoue, Koji, Lala, Divesh, Ochi, Keiko, Kawahara, Tatsuya
This study examined users' behavioral differences in a large corpus of Japanese human-robot interactions, comparing interactions between a tele-operated robot and an autonomous dialogue system. We analyzed user spoken behaviors in both attentive listening and job interview dialogue scenarios. Results revealed significant differences in metrics such as speech length, speaking rate, fillers, backchannels, disfluencies, and laughter between operator-controlled and autonomous conditions. Furthermore, we developed predictive models to distinguish between operator and autonomous system conditions. Our models demonstrated higher accuracy and precision compared to the baseline model, with several models also achieving a higher F1 score than the baseline.
- North America > United States > Pennsylvania (0.04)
- Asia > Singapore (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Wu, Haibin, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Tompkins, Daniel, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Zhao, Sheng, Li, Jinyu, Kanda, Naoyuki
People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. EmoCtrl-TTS leverages arousal and valence values, as well as laughter embeddings, to condition the flow-matching-based zero-shot TTS. To achieve high-quality emotional speech generation, EmoCtrl-TTS is trained using more than 27,000 hours of expressive data curated based on pseudo-labeling. Comprehensive evaluations demonstrate that EmoCtrl-TTS excels in mimicking the emotions of audio prompts in speech-to-speech translation scenarios. We also show that EmoCtrl-TTS can capture emotion changes, express strong emotions, and generate various NVs in zero-shot TTS. See https://aka.ms/emoctrl-tts for demo samples.
- North America > United States (0.14)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Asia > Taiwan (0.04)