AITopics | Parcalabescu, Letitia

Plotting

Parcalabescu, Letitia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Parcalabescu, Letitia, Frank, Anette

arXiv.org Artificial IntelligenceJun-10-2024

Vision and language model (VLM) decoders are currently the best-performing architectures on multimodal tasks. Next to predictions, they can also produce explanations, either in post-hoc or CoT settings. However, it is not clear how much they use the vision and text modalities when generating predictions or explanations. In this work, we investigate if VLMs rely on modalities differently when they produce explanations as opposed to providing answers. We also evaluate the self-consistency of VLM decoders in both post-hoc and CoT explanation settings, by extending existing unimodal tests and measures to VLM decoders. We find that VLMs are less self-consistent than LLMs. Text contributions in VL decoders are more important than image contributions in all examined tasks. Moreover, the contributions of images are significantly stronger for explanation generation compared to answer generation. This difference is even larger in CoT compared to post-hoc explanations. Lastly, we provide an up-to-date benchmarking of state-of-the-art VL decoders on the VALSE benchmark, which before only covered VL encoders. We find that VL decoders still struggle with most phenomena tested by VALSE.

explanation, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2404.18624

Country:

Europe (0.46)
North America > Canada (0.28)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On Measuring Faithfulness of Natural Language Explanations

Parcalabescu, Letitia, Frank, Anette

arXiv.org Artificial IntelligenceNov-13-2023

Large language models (LLMs) can explain their own predictions, through post-hoc or Chain-of-Thought (CoT) explanations. However the LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of either post-hoc or CoT explanations. In this paper we argue that existing faithfulness tests are not actually measuring faithfulness in terms of the models' inner workings, but only evaluate their self-consistency on the output level. The aims of our work are two-fold. i) We aim to clarify the status of existing faithfulness tests in terms of model explainability, characterising them as self-consistency tests instead. This assessment we underline by constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open-source LLMs and 5 datasets -- including ii) our own proposed self-consistency measure CC-SHAP. CC-SHAP is a new fine-grained measure (not test) of LLM self-consistency that compares a model's input contributions to answer prediction and generated explanation. With CC-SHAP, we aim to take a step further towards measuring faithfulness with a more interpretable and fine-grained method. Code available at \url{https://github.com/Heidelberg-NLP/CC-SHAP}

artificial intelligence, large language model, natural language explanation, (1 more...)

arXiv.org Artificial Intelligence

2311.07466

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

Kesen, Ilker, Pedrotti, Andrea, Dogan, Mustafa, Cafagna, Michele, Acikgoz, Emre Can, Parcalabescu, Letitia, Calixto, Iacer, Frank, Anette, Gatt, Albert, Erdem, Aykut, Erdem, Erkut

arXiv.org Artificial IntelligenceNov-12-2023

With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm footing. Task-based evaluations, while valuable, fail to capture the complexities and specific temporal aspects of moving images that VidLMs need to process. Through carefully curated counterfactuals, ViLMA offers a controlled evaluation suite that sheds light on the true potential of these models, as well as their performance gaps compared to human-level understanding. ViLMA also includes proficiency tests, which assess basic capabilities deemed essential to solving the main counterfactual tests. We show that current VidLMs' grounding abilities are no better than those of vision-language models which use static images. This is especially striking once the performance on proficiency tests is factored in. Our benchmark serves as a catalyst for future research on VidLMs, helping to highlight areas that still need to be explored.

artificial intelligence, linguistic and temporal grounding, video-language model, (2 more...)

arXiv.org Artificial Intelligence

2311.07022

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks

Parcalabescu, Letitia, Frank, Anette

arXiv.org Artificial IntelligenceMay-23-2023

Vision and language models (VL) are known to exploit unrobust indicators in individual modalities (e.g., introduced by distributional biases) instead of focusing on relevant information in each modality. That a unimodal model achieves similar accuracy on a VL task to a multimodal one, indicates that so-called unimodal collapse occurred. However, accuracybased tests fail to detect e.g., when the model prediction is wrong, while the model used relevant information from a modality. Instead, we propose MM-SHAP, a performance-agnostic multimodality score based on Shapley values that reliably quantifies in which proportions a multimodal model uses individual modalities. We apply MM-SHAP in two ways: (1) to compare models for their average degree of multimodality, and (2) to measure for individual models the contribution of individual modalities for different tasks and datasets. Experiments with six VL models - LXMERT, CLIP and four ALBEF variants - on four VL tasks highlight that unimodal collapse can occur to different degrees and in different directions, contradicting the wide-spread assumption that unimodal collapse is one-sided. Based on our results, we recommend MM-SHAP for Figure 1: We display image-sentence alignment scores analysing multimodal tasks, to diagnose and (ISA) and the textual degree T-SHAP that measures guide progress towards multimodal integration.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.08158

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Eichenberg, Constantin, Black, Sidney, Weinbach, Samuel, Parcalabescu, Letitia, Frank, Anette

arXiv.org Artificial IntelligenceOct-24-2022

Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2% of the number of samples used to train SimVLM.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2112.05253

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)

Add feedback

Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

Erdem, Erkut (Hacettepe University, Ankara, Turkey) | Kuyu, Menekse (Hacettepe University, Ankara, Turkey) | Yagcioglu, Semih (Hacettepe University, Ankara, Turkey) | Frank, Anette (Heidelberg University, Heidelberg, Germany) | Parcalabescu, Letitia (Heidelberg University, Heidelberg, Germany) | Plank, Barbara (IT University of Copenhagen, Copenhagen, Denmark) | Babii, Andrii (Kharkiv National University of Radio Electronics, Ukraine) | Turuta, Oleksii (Kharkiv National University of Radio Electronics, Ukraine) | Erdem, Aykut | Calixto, Iacer (New York University, U.S.A. / University of Amsterdam, Netherlands) | Lloret, Elena (University of Alicante, Alicante, Spain) | Apostol, Elena-Simona (University Politehnica of Bucharest, Bucharest, Romania) | Truică, Ciprian-Octavian (University Politehnica of Bucharest, Bucharest, Romania) | Šandrih, Branislava (University of Belgrade, Belgrade, Serbia) | Martinčić-Ipšić, Sanda (University of Rijeka, Rijeka, Croatia) | Berend, Gábor (University of Szeged, Szeged, Hungary) | Gatt, Albert (University of Malta, Malta) | Korvel, Grăzina (Vilnius University, Vilnius, Lithuania)

Journal of Artificial Intelligence ResearchApr-6-2022

Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.

machine learning, natural language, text simplification and paraphrasing, (25 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12918

AI Access Foundation

12918

Journal of Artificial Intelligence Research

Country:

Europe > Spain (0.67)
North America > United States > California (0.46)
North America > United States > Minnesota (0.28)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.48)
Government > Regional Government > Europe Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Parcalabescu, Letitia, Cafagna, Michele, Muradjan, Lilitta, Frank, Anette, Calixto, Iacer, Gatt, Albert

arXiv.org Artificial IntelligenceMar-14-2022

We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.acl-long.567

2112.07566

Country:

Europe (1.00)
North America > United States > Louisiana (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

Add feedback

What is Multimodality?

Parcalabescu, Letitia, Trost, Nils, Frank, Anette

arXiv.org Artificial IntelligenceMar-10-2021

The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the machine learning era. We propose a new task-relative definition of (multi)modality in the context of multimodal machine learning that focuses on representations and information that are relevant for a given machine learning task. With our new definition of multimodality we aim to provide a missing foundation for multimodal research, an important component of language grounding and a crucial milestone towards NLU.

deep learning, multimodality, neural network, (18 more...)

arXiv.org Artificial Intelligence

2103.06304

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback