AITopics | augmented caption

Collaborating Authors

augmented caption

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset

Alwajih, Fakhraddin, Magdy, Samar M., Mekki, Abdellah El, Nacar, Omer, Nafea, Youssef, Abdelfadil, Safaa Taher, Yahya, Abdulfattah Mohammed, Luqman, Hamzah, Almarwani, Nada, Aloufi, Samah, Qawasmen, Baraah, Atou, Houdaifa, Sibaee, Serry, Alsayadi, Hamzah A., Al-Dhabyani, Walid, Al-shaibani, Maged S., Aatar, Aya El, Qandos, Nour, Alhamouri, Rahaf, Ahmad, Samar, Al-Ghrawi, Mohammed Anwar, Yacoub, Aminetou, AbuHweidi, Ruwa, Lemin, Vatimetou Mohamed, Abdel-Salam, Reem, Bashiti, Ahlam, Alansari, Aisha, Ashraf, Ahmed, Alturayeif, Nora, Inciarte, Alcides Alcoba, Ammar, Adel, Elmadany, Abdelrahim A., Tourad, Mohamedou Cheikh, Berrada, Ismail, Jarrar, Mustafa, Shehata, Shady, Abdul-Mageed, Muhammad

arXiv.org Artificial IntelligenceSep-30-2025

Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.21979

Country:

Asia > Middle East (1.00)
Africa > Middle East > Egypt (0.28)

Genre:

Overview (0.92)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Caption Enriched Samples for Improving Hateful Memes Detection

Blaier, Efrat, Malkiel, Itzik, Wolf, Lior

arXiv.org Artificial IntelligenceSep-22-2021

The recently introduced hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Specifically, both unimodal language models and multimodal vision-language models cannot reach the human level of performance. Motivated by the need to model the contrast between the image content and the overlayed text, we suggest applying an off-the-shelf image captioning tool in order to capture the first. We demonstrate that the incorporation of such automatic captions during fine-tuning improves the results for various unimodal and multimodal models. Moreover, in the unimodal case, continuing the pre-training of language models on augmented and original caption pairs, is highly beneficial to the classification accuracy.

augmented caption, caption, meme, (14 more...)

arXiv.org Artificial Intelligence

2109.10649

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback