Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
Ryan, Yuriel, Tan, Rui Yang, Choo, Kenny Tsu Wei, Lee, Roy Ka-Wei
–arXiv.org Artificial Intelligence
Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.
arXiv.org Artificial Intelligence
Sep-18-2025
- Country:
- Asia
- China
- Hong Kong (0.04)
- Shandong Province > Qingdao (0.04)
- Middle East > Yemen
- Amran Governorate > Amran (0.04)
- Singapore (0.04)
- China
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Tuscany
- Florence (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Slovenia (0.04)
- Denmark > Capital Region
- North America
- Canada
- British Columbia (0.04)
- Ontario > Toronto (0.04)
- United States > New York (0.04)
- Canada
- South America > Chile
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science (1.00)
- Machine Learning
- Neural Networks > Deep Learning (0.96)
- Performance Analysis > Accuracy (0.68)
- Natural Language > Large Language Model (1.00)
- Representation & Reasoning (0.93)
- Vision (0.93)
- Communications > Social Media (0.67)
- Artificial Intelligence
- Information Technology