sequence recognition
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
Ryan, Yuriel, Tan, Rui Yang, Choo, Kenny Tsu Wei, Lee, Roy Ka-Wei
Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.
Sequential Decision Making - an overview
Central to many formulations of sequence recognition are problems in sequential decision-making. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations, and based on these observations, the recognition process produces a hypothesis of the underlying events. The events in the underlying process are constrained to follow a certain loose order, for example by a grammar, so that decisions made early in the recognition process restrict or narrow the choices that can be made later. This problem is well known and leads to the use of dynamic programming (DP) algorithms [Bel57] so that unalterable decisions can be avoided until all available information has been processed. DP strategies are central to hidden Markov model (HMM) recognizers [LMS84,Lev85,Rab89,RBH86] and have also been widely used in systems based on neural networks (e.g., [SIY 89,Bur88,BW89,SL92,BM90,FLW90]) to transform static pattern classifiers into sequence recognizers.