Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects
Gao, Xiyuan, Nayak, Shekhar, Coler, Matt
–arXiv.org Artificial Intelligence
Sarcasm, a common feature of human communication, poses challenges in interpersonal interactions and human-machine interactions. Linguistic research has highlighted the importance of prosodic cues, such as variations in pitch, speaking rate, and intonation, in conveying sarcastic intent. Although previous work has focused on text-based sarcasm detection, the role of speech data in recognizing sarcasm has been underexplored. Recent advancements in speech technology emphasize the growing importance of leveraging speech data for automatic sarcasm recognition, which can enhance social interactions for individuals with neurodegenerative conditions and improve machine understanding of complex human language use, leading to more nuanced interactions. This systematic review is the first to focus on speech-based sarcasm recognition, charting the evolution from unimodal to multimodal approaches. It covers datasets, feature extraction, and classification methods, and aims to bridge gaps across diverse research domains. The findings include limitations in datasets for sarcasm recognition in speech, the evolution of feature extraction techniques from traditional acoustic features to deep learning-based representations, and the progression of classification methods from unimodal approaches to multimodal fusion techniques. In so doing, we identify the need for greater emphasis on cross-cultural and multilingual sarcasm recognition, as well as the importance of addressing sarcasm as a multimodal phenomenon, rather than a text-based challenge.
arXiv.org Artificial Intelligence
Sep-8-2025
- Country:
- Africa > Middle East
- Morocco (0.04)
- Asia
- China
- Shaanxi Province > Xi'an (0.04)
- Sichuan Province > Chengdu (0.04)
- India
- Jharkhand > Ranchi (0.04)
- Karnataka > Bengaluru (0.04)
- NCT > Delhi (0.04)
- Tamil Nadu > Chennai (0.04)
- West Bengal > Kharagpur (0.04)
- Singapore (0.04)
- South Korea > Incheon
- Incheon (0.04)
- China
- Europe
- France
- Auvergne-Rhône-Alpes > Lyon
- Lyon (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Auvergne-Rhône-Alpes > Lyon
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Berlin (0.04)
- Bavaria > Upper Bavaria
- Italy > Tuscany
- Florence (0.04)
- Netherlands
- Friesland > Leeuwarden (0.04)
- North Holland > Amsterdam (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- France
- North America
- Canada
- Alberta > Census Division No. 19
- Saddle Hills County (0.04)
- Ontario
- National Capital Region > Ottawa (0.04)
- Toronto (0.04)
- Alberta > Census Division No. 19
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Francisco County > San Francisco (0.14)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas > Travis County
- Austin (0.04)
- California
- Canada
- Africa > Middle East
- Genre:
- Overview (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.66)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Natural Language > Large Language Model (1.00)
- Representation & Reasoning > Information Fusion (0.88)
- Speech > Speech Recognition (1.00)
- Vision > Face Recognition (0.93)
- Machine Learning
- Information Technology > Artificial Intelligence