Goto

Collaborating Authors

 Media


SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning

arXiv.org Artificial Intelligence

The rich and multifaceted nature of human social interaction, encompassing multimodal cues, unobservable relations and mental states, and dynamical behavior, presents a formidable challenge for artificial intelligence. To advance research in this area, we introduce SIV-Bench, a novel video benchmark for rigorously evaluating the capabilities of Multimodal Large Language Models (MLLMs) across Social Scene Understanding (SSU), Social State Reasoning (SSR), and Social Dynamics Prediction (SDP). SIV-Bench features 2,792 video clips and 8,792 meticulously generated question-answer pairs derived from a human-LLM collaborative pipeline. It is originally collected from TikTok and YouTube, covering a wide range of video genres, presentation styles, and linguistic and cultural backgrounds. It also includes a dedicated setup for analyzing the impact of different textual cues-original on-screen text, added dialogue, or no text. Our comprehensive experiments on leading MLLMs reveal that while models adeptly handle SSU, they significantly struggle with SSR and SDP, where Relation Inference (RI) is an acute bottleneck, as further examined in our analysis. Our study also confirms the critical role of transcribed dialogue in aiding comprehension of complex social interactions. By systematically identifying current MLLMs' strengths and limitations, SIV-Bench offers crucial insights to steer the development of more socially intelligent AI. The dataset and code are available at https://kfq20.github.io/sivbench/.


Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation

arXiv.org Artificial Intelligence

Image captioning involves generating textual descriptions from input images, bridging the gap between computer vision and natural language processing. Recent advancements in transformer-based models have significantly improved caption generation by leveraging attention mechanisms for better scene understanding. While various surveys have explored deep learning-based approaches for image captioning, few have comprehensively analyzed attention-based transformer models across multiple languages. This survey reviews attention-based image captioning models, categorizing them into transformer-based, deep learning-based, and hybrid approaches. It explores benchmark datasets, discusses evaluation metrics such as BLEU, METEOR, CIDEr, and ROUGE, and highlights challenges in multilingual captioning. Additionally, this paper identifies key limitations in current models, including semantic inconsistencies, data scarcity in non-English languages, and limitations in reasoning ability. Finally, we outline future research directions, such as multimodal learning, real-time applications in AI-powered assistants, healthcare, and forensic analysis. This survey serves as a comprehensive reference for researchers aiming to advance the field of attention-based image captioning.


taz2024full: Analysing German Newspapers for Gender Bias and Discrimination across Decades

arXiv.org Artificial Intelligence

Open-access corpora are essential for advancing natural language processing (NLP) and computational social science (CSS). However, large-scale resources for German remain limited, restricting research on linguistic trends and societal issues such as gender bias. We present taz2024full, the largest publicly available corpus of German newspaper articles to date, comprising over 1.8 million texts from taz, spanning 1980 to 2024. As a demonstration of the corpus's utility for bias and discrimination research, we analyse gender representation across four decades of reporting. We find a consistent overrepresentation of men, but also a gradual shift toward more balanced coverage in recent years. Using a scalable, structured analysis pipeline, we provide a foundation for studying actor mentions, sentiment, and linguistic framing in German journalistic texts. The corpus supports a wide range of applications, from diachronic language analysis to critical media studies, and is freely available to foster inclusive and reproducible research in German-language NLP.


Speaking images. A novel framework for the automated self-description of artworks

arXiv.org Artificial Intelligence

Recent breakthroughs in generative AI have opened the door to new research perspectives in the domain of art and cultural heritage, where a large number of artifacts have been digitized. There is a need for innovation to ease the access and highlight the content of digital collections. Such innovations develop into creative explorations of the digital image in relation to its malleability and contemporary interpretation, in confrontation to the original historical object. Based on the concept of the autonomous image, we propose a new framework towards the production of self-explaining cultural artifacts using open-source large-language, face detection, text-to-speech and audio-to-animation models. The goal is to start from a digitized artwork and to automatically assemble a short video of the latter where the main character animates to explain its content. The whole process questions cultural biases encapsulated in large-language models, the potential of digital images and deepfakes of artworks for educational purposes, along with concerns of the field of art history regarding such creative diversions.


Automated Journalistic Questions: A New Method for Extracting 5W1H in French

arXiv.org Artificial Intelligence

The 5W1H questions -- who, what, when, where, why and how -- are commonly used in journalism to ensure that an article describes events clearly and systematically. Answering them is a crucial prerequisites for tasks such as summarization, clustering, and news aggregation. In this paper, we design the first automated extraction pipeline to get 5W1H information from French news articles. To evaluate the performance of our algorithm, we also create a corpus of 250 Quebec news articles with 5W1H answers marked by four human annotators. Our results demonstrate that our pipeline performs as well in this task as the large language model GPT-4o.


Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

arXiv.org Artificial Intelligence

Recent advances in large-scale generative language models have shown that reasoning capabilities can significantly improve model performance across a variety of tasks. However, the impact of reasoning on a model's ability to mitigate stereotypical responses remains largely underexplored. In this work, we investigate the crucial relationship between a model's reasoning ability and fairness, and ask whether improved reasoning capabilities can mitigate harmful stereotypical responses, especially those arising due to shallow or flawed reasoning. We conduct a comprehensive evaluation of multiple open-source LLMs, and find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias on existing fairness benchmarks. Building on this insight, we introduce ReGiFT -- Reasoning Guided Fine-Tuning, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities. We use only general-purpose reasoning and do not require any fairness-specific supervision for bias mitigation. Notably, we see that models fine-tuned using ReGiFT not only improve fairness relative to their non-reasoning counterparts but also outperform advanced reasoning models on fairness benchmarks. We also analyze how variations in the correctness of the reasoning traces and their length influence model fairness and their overall performance. Our findings highlight that enhancing reasoning capabilities is an effective, fairness-agnostic strategy for mitigating stereotypical bias caused by reasoning flaws.


6 high-tech Father's Day gifts that show you really care

FOX News

Ideas to make Dad's day brighter and more memorable. Shopping for Father's Day can feel like a bit of a puzzle, right? Maybe yours is always glued to his laptop, loves keeping the pool spotless or enjoys pouring a fresh pint at home. If you want to skip the usual gifts and pick something that actually matches his personality, you're in the right place. From cool gadgets to smart tools and a few fun surprises, these ideas are all about making dad's day brighter and more memorable.


How to take photos on your phone via remote control

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Our smartphones have transformed the way we take photos and videos and our relationship to these digital memories. Most of us will snap at least some pictures and clips every day with the gadget that's always close at hand. If you want to get more creative with photos on your phone, you can. Sometimes you're going to want to take a picture remotely, without your phone in your hand and your finger over the shutter button--maybe you're taking a wide shot of a large group, or you want to capture a lot of your surroundings.


New mobile robot helps seniors walk safely and prevent falls

FOX News

E-BAR operates as a set of robotic handlebars that follow users. The demographic landscape in the U.S. is shifting rapidly, with the median age now at 38.9, almost a decade older than it was in 1980. By 2050, the population of adults over 65 is projected to surge from 58 million to 82 million, intensifying the already urgent challenge of eldercare. With falls remaining the top cause of injury among older adults, the need for innovative, tech-driven solutions has never been clearer. MIT engineers are stepping up to this challenge with E-BAR, a mobile robot designed to physically support seniors and prevent falls as they move around their homes.


Replace your sunglasses with this rare deal on Ray-Ban Meta smart glasses during this rare Amazon deal

Popular Science

If you've been wanting a pair of Ray-Ban Meta smartglasses, this is the best price I've seen since last year's Black Friday. Amazon has pairs as low as 239 right now, both with and without tinted lenses. They offer the classic Wayfarer style, so they look good on just about everyone. The sale is limited to what's in stock right now, so grab the color and the size you want before they sell out. When talking about the Ray-Ban Meta glasses, most people focus on the built-in camera.