AITopics | Frank, Anette

Collaborating Authors

Frank, Anette

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Argumentation to Deliberation: Perspectivized Stance Vectors for Fine-grained (Dis)agreement Analysis

Plenz, Moritz, Heinisch, Philipp, Gehring, Janosch, Cimiano, Philipp, Frank, Anette

arXiv.org Artificial IntelligenceFeb-10-2025

Debating over conflicting issues is a necessary first step towards resolving conflicts. However, intrinsic perspectives of an arguer are difficult to overcome by persuasive argumentation skills. Proceeding from a debate to a deliberative process, where we can identify actionable options for resolving a conflict requires a deeper analysis of arguments and the perspectives they are grounded in - as it is only from there that one can derive mutually agreeable resolution steps. In this work we develop a framework for a deliberative analysis of arguments in a computational argumentation setup. We conduct a fine-grained analysis of perspectivized stances expressed in the arguments of different arguers or stakeholders on a given issue, aiming not only to identify their opposing views, but also shared perspectives arising from their attitudes, values or needs. We formalize this analysis in Perspectivized Stance Vectors that characterize the individual perspectivized stances of all arguers on a given issue. We construct these vectors by determining issue- and argument-specific concepts, and predict an arguer's stance relative to each of them. The vectors allow us to measure a modulated (dis)agreement between arguers, structured by perspectives, which allows us to identify actionable points for conflict resolution, as a first step towards deliberation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.09644

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.46)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Education (0.96)
Law > Criminal Law (0.70)
Government > Immigration & Customs (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning

Fu, Xiyan, Frank, Anette

arXiv.org Artificial IntelligenceOct-8-2024

While LLMs have emerged as performant architectures for reasoning tasks, their compositional generalization capabilities have been questioned. In this work, we introduce a Compositional Generalization Challenge for Graph-based Commonsense Reasoning (CGGC) that goes beyond previous evaluations that are based on sequences or tree structures - and instead involves a reasoning graph: It requires models to generate a natural sentence based on given concepts and a corresponding reasoning graph, where the presented graph involves a previously unseen combination of relation types. To master this challenge, models need to learn how to reason over relation tupels within the graph, and how to compose them when conceptualizing a verbalization. We evaluate seven well-known LLMs using in-context learning and find that performant LLMs still struggle in compositional generalization. We investigate potential causes of this gap by analyzing the structures of reasoning graphs, and find that different structures present varying levels of difficulty for compositional generalization. Arranging the order of demonstrations according to the structures' difficulty shows that organizing samples in an easy-to-hard schema enhances the compositional generalization ability of LLMs.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.06272

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Parcalabescu, Letitia, Frank, Anette

arXiv.org Artificial IntelligenceJun-10-2024

Vision and language model (VLM) decoders are currently the best-performing architectures on multimodal tasks. Next to predictions, they can also produce explanations, either in post-hoc or CoT settings. However, it is not clear how much they use the vision and text modalities when generating predictions or explanations. In this work, we investigate if VLMs rely on modalities differently when they produce explanations as opposed to providing answers. We also evaluate the self-consistency of VLM decoders in both post-hoc and CoT explanation settings, by extending existing unimodal tests and measures to VLM decoders. We find that VLMs are less self-consistent than LLMs. Text contributions in VL decoders are more important than image contributions in all examined tasks. Moreover, the contributions of images are significantly stronger for explanation generation compared to answer generation. This difference is even larger in CoT compared to post-hoc explanations. Lastly, we provide an up-to-date benchmarking of state-of-the-art VL decoders on the VALSE benchmark, which before only covered VL encoders. We find that VL decoders still struggle with most phenomena tested by VALSE.

explanation, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2404.18624

Country:

Europe (0.46)
North America > Canada (0.28)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

Richter-Pechanski, Phillip, Wiesenbach, Philipp, Schwab, Dominic M., Kiriakou, Christina, Geis, Nicolas, Dieterich, Christoph, Frank, Anette

arXiv.org Artificial IntelligenceMar-20-2024

Automatic extraction of medical information from these data poses several challenges: high costs of required clinical expertise, restricted computational resources, strict privacy regulations, and limited interpretability of model predictions. Recent domain adaptation and prompting methods using lightweight masked language models showed promising results with minimal training data and allow for application of well-established interpretability methods. We are first to present a systematic evaluation of advanced domain adaptation and prompting methods in a low-resource medical domain task, performing multiclass section classification on German doctor's letters. We evaluate a variety of models, model sizes, (further-pre)training and task settings, and conduct extensive class-wise evaluations supported by Shapley values to validate the quality of small-scale training data, and to ensure interpretability of model predictions. We show that in few-shot learning scenarios, a lightweight, domain-adapted pretrained language model, prompted with just 20 shots per section class, outperforms a traditional classification model, by increasing accuracy from 48.6% to 79.1%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2403.13369

Country:

Europe > Germany (0.14)
Asia > China (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Exploring Continual Learning of Compositional Generalization in NLI

Fu, Xiyan, Frank, Anette

arXiv.org Artificial IntelligenceMar-7-2024

Compositional Natural Language Inference has been explored to assess the true abilities of neural models to perform NLI. Yet, current evaluations assume models to have full access to all primitive inferences in advance, in contrast to humans that continuously acquire inference knowledge. In this paper, we introduce the Continual Compositional Generalization in Inference (C2Gen NLI) challenge, where a model continuously acquires knowledge of constituting primitive inference tasks as a basis for compositional inferences. We explore how continual learning affects compositional generalization in NLI, by designing a continual learning setup for compositional NLI inference tasks. Our experiments demonstrate that models fail to compositionally generalize in a continual scenario. To address this problem, we first benchmark various continual learning algorithms and verify their efficacy. We then further analyze C2Gen, focusing on how to order primitives and compositional inference types and examining correlations between subtasks. Our analyses show that by learning subtasks continuously while observing their dependencies and increasing degrees of difficulty, continual learning can enhance composition generalization ability.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2403.044

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Graph Language Models

Plenz, Moritz, Frank, Anette

arXiv.org Artificial IntelligenceJan-13-2024

While Language Models have become workhorses for NLP, their interplay with textual knowledge graphs (KGs) - structured memories of general or domain knowledge - is actively researched. Current embedding methodologies for such graphs typically either (i) linearize graphs for embedding them using sequential Language Models (LMs), which underutilize structural information, or (ii) use Graph Neural Networks (GNNs) to preserve graph structure, while GNNs cannot represent textual features as well as a pre-trained LM could. In this work we introduce a novel language model, the Graph Language Model (GLM), that integrates the strengths of both approaches, while mitigating their weaknesses. The GLM parameters are initialized from a pretrained LM, to facilitate nuanced understanding of individual concepts and triplets. Simultaneously, its architectural design incorporates graph biases, thereby promoting effective knowledge distribution within the graph. Empirical evaluations on relation classification tasks on ConceptNet subgraphs reveal that GLM embeddings surpass both LM- and GNN-based baselines in supervised and zero-shot settings.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.07105

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On Measuring Faithfulness of Natural Language Explanations

Parcalabescu, Letitia, Frank, Anette

arXiv.org Artificial IntelligenceNov-13-2023

Large language models (LLMs) can explain their own predictions, through post-hoc or Chain-of-Thought (CoT) explanations. However the LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of either post-hoc or CoT explanations. In this paper we argue that existing faithfulness tests are not actually measuring faithfulness in terms of the models' inner workings, but only evaluate their self-consistency on the output level. The aims of our work are two-fold. i) We aim to clarify the status of existing faithfulness tests in terms of model explainability, characterising them as self-consistency tests instead. This assessment we underline by constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open-source LLMs and 5 datasets -- including ii) our own proposed self-consistency measure CC-SHAP. CC-SHAP is a new fine-grained measure (not test) of LLM self-consistency that compares a model's input contributions to answer prediction and generated explanation. With CC-SHAP, we aim to take a step further towards measuring faithfulness with a more interpretable and fine-grained method. Code available at \url{https://github.com/Heidelberg-NLP/CC-SHAP}

artificial intelligence, large language model, natural language explanation, (1 more...)

arXiv.org Artificial Intelligence

2311.07466

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

Kesen, Ilker, Pedrotti, Andrea, Dogan, Mustafa, Cafagna, Michele, Acikgoz, Emre Can, Parcalabescu, Letitia, Calixto, Iacer, Frank, Anette, Gatt, Albert, Erdem, Aykut, Erdem, Erkut

arXiv.org Artificial IntelligenceNov-12-2023

With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm footing. Task-based evaluations, while valuable, fail to capture the complexities and specific temporal aspects of moving images that VidLMs need to process. Through carefully curated counterfactuals, ViLMA offers a controlled evaluation suite that sheds light on the true potential of these models, as well as their performance gaps compared to human-level understanding. ViLMA also includes proficiency tests, which assess basic capabilities deemed essential to solving the main counterfactual tests. We show that current VidLMs' grounding abilities are no better than those of vision-language models which use static images. This is especially striking once the performance on proficiency tests is factored in. Our benchmark serves as a catalyst for future research on VidLMs, helping to highlight areas that still need to be explored.

artificial intelligence, linguistic and temporal grounding, video-language model, (2 more...)

arXiv.org Artificial Intelligence

2311.07022

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Dynamic MOdularized Reasoning for Compositional Structured Explanation Generation

Fu, Xiyan, Frank, Anette

arXiv.org Artificial IntelligenceSep-14-2023

Despite the success of neural models in solving reasoning tasks, their compositional generalization capabilities remain unclear. In this work, we propose a new setting of the structured explanation generation task to facilitate compositional reasoning research. Previous works found that symbolic methods achieve superior compositionality by using pre-defined inference rules for iterative reasoning. But these approaches rely on brittle symbolic transfers and are restricted to well-defined tasks. Hence, we propose a dynamic modularized reasoning model, MORSE, to improve the compositional generalization of neural models. MORSE factorizes the inference process into a combination of modules, where each module represents a functional unit. Specifically, we adopt modularized self-attention to dynamically select and route inputs to dedicated heads, which specializes them to specific functions. We conduct experiments for increasing lengths and shapes of reasoning trees on two benchmarks to test MORSE's compositional generalization abilities, and find it outperforms competitive baselines. Model ablation and deeper analyses show the effectiveness of dynamic reasoning modules and their generalization abilities.

artificial intelligence, expert system, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.07624

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.70)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

AMR4NLI: Interpretable and robust NLI measures from semantic graphs

Opitz, Juri, Wein, Shira, Steen, Julius, Frank, Anette, Schneider, Nathan

arXiv.org Artificial IntelligenceSep-5-2023

The task of natural language inference (NLI) asks whether a given premise (expressed in NL) entails a given NL hypothesis. NLI benchmarks contain human ratings of entailment, but the meaning relationships driving these ratings are not formalized. Can the underlying sentence pair relationships be made more explicit in an interpretable yet robust fashion? We compare semantic structures to represent premise and hypothesis, including sets of contextualized embeddings and semantic graphs (Abstract Meaning Representations), and measure whether the hypothesis is a semantic substructure of the premise, utilizing interpretable metrics. Our evaluation on three English benchmarks finds value in both contextualized embeddings and semantic graphs; moreover, they provide complementary signals, and can be leveraged together in a hybrid model.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.00936

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback