AITopics | Villegas, Danae Sánchez

Collaborating Authors

Villegas, Danae Sánchez

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models

Villegas, Danae Sánchez, Ziegler, Ingo, Elliott, Desmond

arXiv.org Artificial IntelligenceFeb-26-2025

Reasoning over sequences of images remains a challenge for multimodal large language models (MLLMs). While recent models incorporate multi-image data during pre-training, they still struggle to recognize sequential structures, often treating images independently. This work introduces ImageChain, a framework that enhances MLLMs with sequential reasoning capabilities over image data by modeling visual sequences as a multi-turn conversation. In ImageChain, images are interleaved with corresponding textual descriptions to form a controlled dialogue that explicitly captures temporal dependencies and narrative progression. Our method optimizes for the task of next-scene description, where the model generates a context-aware description of an upcoming scene based on preceding visual and textual cues. We demonstrate that our approach improves performance on the next-scene description task -- achieving an average improvement from 3.7% to 19% in SimRate, a metric that quantifies semantic similarity to human-annotated ground truths. Moreover, ImageChain achieves robust zero-shot out-of-domain performance in applications ranging from comics to robotics. Extensive experiments validate that instruction-tuning in a multimodal, multi-turn conversation design is key to bridging the gap between static image understanding and temporally-aware reasoning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.19409

Country:

North America > United States > California (0.14)
North America > Mexico > Mexico City (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks

Villegas, Danae Sánchez, Preoţiuc-Pietro, Daniel, Aletras, Nikolaos

arXiv.org Artificial IntelligenceFeb-3-2024

Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection or hate speech classification. Jointly modeling text and images is challenging because cross-modal semantics might be hidden or the relation between image and text is weak. However, prior work on multimodal classification of social media posts has not yet addressed these challenges. In this work, we present an extensive study on the effectiveness of using two auxiliary losses jointly with the main task during fine-tuning multimodal models. First, Image-Text Contrastive (ITC) is designed to minimize the distance between image-text representations within a post, thereby effectively bridging the gap between posts where the image plays an important role in conveying the post's meaning. Second, Image-Text Matching (ITM) enhances the model's ability to understand the semantic relationship between images and text, thus improving its capacity to handle ambiguous or loosely related modalities. We combine these objectives with five multimodal models across five diverse social media datasets, demonstrating consistent improvements of up to 2.6 points F1. Our comprehensive analysis shows the specific scenarios where each auxiliary task is most effective.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2309.07794

Country:

Europe (0.93)
Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Sheffield's Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

Gow-Smith, Edward, Villegas, Danae Sánchez

arXiv.org Artificial IntelligenceJun-16-2023

In this paper we describe the University of Sheffield's submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages which comprises the translation from Spanish to eleven indigenous languages. Our approach consists of extending, training, and ensembling different variations of NLLB-200. We use data provided by the organizers and data from various other sources such as constitutions, handbooks, news articles, and backtranslations generated from monolingual data. On the dev set, our best submission outperforms the baseline by 11% average chrF across all languages, with substantial improvements particularly for Aymara, Guarani and Quechua. On the test set, we achieve the highest average chrF of all the submissions, we rank first in four of the eleven languages, and at least one of our submissions ranks in the top 3 for all languages.

artificial intelligence, natural language, submission, (17 more...)

arXiv.org Artificial Intelligence

2306.0983

Country:

Europe > France (0.15)
Europe > Portugal (0.14)
Europe > Italy (0.14)
(3 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback