Goto

Collaborating Authors

 comic strip


Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences?

arXiv.org Artificial Intelligence

Large Multimodal Models (LMMs) have achieved remarkable success across various visual-language tasks. However, existing benchmarks predominantly focus on single-image understanding, leaving the analysis of image sequences largely unexplored. To address this limitation, we introduce StripCipher, a comprehensive benchmark designed to evaluate capabilities of LMMs to comprehend and reason over sequential images. StripCipher comprises a human-annotated dataset and three challenging subtasks: visual narrative comprehension, contextual frame prediction, and temporal narrative reordering. Our evaluation of $16$ state-of-the-art LMMs, including GPT-4o and Qwen2.5VL, reveals a significant performance gap compared to human capabilities, particularly in tasks that require reordering shuffled sequential images. For instance, GPT-4o achieves only 23.93% accuracy in the reordering subtask, which is 56.07% lower than human performance. Further quantitative analysis discuss several factors, such as input format of images, affecting the performance of LLMs in sequential understanding, underscoring the fundamental challenges that remain in the development of LMMs.


- AI What a surprise to find a comic strip on...

#artificialintelligence

What a surprise to find a comic strip on Artificial Intelligence written by somebody called Montaigne!: Marion Montaigne prรฉsente l'Intelligence artificielle: https://youtu.be/DtdoNksCtmE Indeed, Montaigne discusses in the Essays whether the savages found mainly in Brazil are human beings or not and whether they should be considered humans, the way the barbarians in Ancient Greece have been finally recognised as such. As it is growing more and more difficult to recognise an artificial intelligence from a human being; I think reconsidering this classic is relevant as it challenges again human identity. From the conquistadors until now the concept of humanity has evolved. On the other hand, what is common to humanity, at each step, from the cannibals till now; is some human beings' will to build weapons to kill others.


Data Science and the Quest for Truth

#artificialintelligence

I was interviewed by IBM to share my thoughts on a topic related to data science about which I'm passionate: How do we know what we know and, once we know it, how do we know it's the truth? IBM turned that interview into a comic strip (see below) and I summarize my points in this post. You would think that, because we have access to so much information in this digital age, getting to the truth would be relatively quick and easy. As I've discussed before, people hold beliefs that are not supported by the information available to them. Take, for example, the 27% of Americans who don't believe there is solid scientific evidence of climate change, the rise of the "anti-vaxxers" who think that they know more about science and public health than the overwhelming majority of doctors, immunologists and other health professionals, the "flat-earthers" who ignore the ample evidence that the earth is a sphere, the increase in Google Trends that shows that searches of the term "flat earth" have more than tripled over the past two years and more. I've given this topic a lot of thought and I would like to talk about the problems about the importance of everybody gaining some knowledge on statistics and critical thinking (scientific method, evidence-based decisions).


The power, promise and controversy of AI applications

#artificialintelligence

A recent comic strip called GIL depicted an 8-year-old boy calling out commands to what he thought was the Amazon Echo. After two simple directives and no response, he shouted, "Alexa! What is wrong with you?!" Gil's mother saw this and asked her son why he was yelling at her new coffee grinder. The fact that there's a front-page comic strip in the Sunday newspaper dedicated to an artificial intelligence system tells me the technology is officially mainstream. Analysts predict voice will be the next big user interface, and though I'm skeptical of predictions for reasons we laid out in our February issue of Business Information, this is one forecast I believe.


Algo-Garfield

#artificialintelligence

Garfield is a comic strip by Jim Davis, who seems like a pretty good guy. A Markov chain is a probabilistic model well suited to semi-coherent text synthesis. Garkov is an application of the Markov model to transcripts of old Garfield strips, plus some extra code to make it all look like a genuine comic strip.