AITopics | Maistro, Maria

Collaborating Authors

Maistro, Maria

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

As easy as PIE: understanding when pruning causes language models to disagree

Tropeano, Pietro, Maistro, Maria, Ruotsalo, Tuukka, Lioma, Christina

arXiv.org Artificial IntelligenceMar-27-2025

Language Model (LM) pruning compresses the model by removing weights, nodes, or other parts of its architecture. Typically, pruning focuses on the resulting efficiency gains at the cost of effectiveness. However, when looking at how individual data points are affected by pruning, it turns out that a particular subset of data points always bears most of the brunt (in terms of reduced accuracy) when pruning, but this effect goes unnoticed when reporting the mean accuracy of all data points. These data points are called PIEs and have been studied in image processing, but not in NLP. In a study of various NLP datasets, pruning methods, and levels of compression, we find that PIEs impact inference quality considerably, regardless of class frequency, and that BERT is more prone to this than BiLSTM. We also find that PIEs contain a high amount of data points that have the largest influence on how well the model generalises to unseen data. This means that when pruning, with seemingly moderate loss to accuracy across all data points, we in fact hurt tremendously those data points that matter the most. We trace what makes PIEs both hard and impactful to inference to their overall longer and more semantically complex text. These findings are novel and contribute to understanding how LMs are affected by pruning. The code is available at: https://github.com/pietrotrope/AsEasyAsPIE

machine learning, natural language, pruning, (19 more...)

arXiv.org Artificial Intelligence

2503.21714

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Reality Check on Context Utilisation for Retrieval-Augmented Generation

Hagström, Lovisa, Marjanović, Sara Vera, Yu, Haeun, Arora, Arnav, Lioma, Christina, Maistro, Maria, Atanasova, Pepa, Augenstein, Isabelle

arXiv.org Artificial IntelligenceDec-22-2024

Retrieval-augmented generation (RAG) helps address the limitations of the parametric knowledge embedded within a language model (LM). However, investigations of how LMs utilise retrieved information of varying complexity in real-world scenarios have been limited to synthetic contexts. We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with real-world queries and contexts manually annotated for stance. The dataset is based on the prototypical task of automated claim verification, for which automated retrieval of real-world evidence is crucial. We compare DRUID to synthetic datasets (CounterFact, ConflictQA) and find that artificial datasets often fail to represent the complex and diverse real-world context settings. We show that synthetic datasets exaggerate context characteristics rare in real retrieved data, which leads to inflated context utilisation results, as measured by our novel ACU score. Moreover, while previous work has mainly focused on singleton context characteristics to explain context utilisation, correlations between singleton context properties and ACU on DRUID are surprisingly small compared to other properties related to context source. Overall, our work underscores the need for real-world aligned context utilisation studies to represent and improve performance in real-world RAG settings.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.17031

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

Edin, Joakim, Maistro, Maria, Maaløe, Lars, Borgholt, Lasse, Havtorn, Jakob D., Ruotsalo, Tuukka

arXiv.org Artificial IntelligenceJun-13-2024

Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves healthcare professionals hesitant to trust them. State-of-the-art explainability methods increase model transparency but rely on human-annotated evidence spans, which are costly. In this study, we propose an approach to produce plausible and faithful explanations without needing such annotations. We demonstrate on the automated medical coding task that adversarial robustness training improves explanation plausibility and introduce AttInGrad, a new explanation method superior to previous ones. By combining both contributions in a fully unsupervised setup, we produce explanations of comparable quality, or better, to that of a supervised approach. We release our code and model weights.

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.08958

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Health Care Providers & Services (0.66)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Edin, Joakim, Junge, Alexander, Havtorn, Jakob D., Borgholt, Lasse, Maistro, Maria, Ruotsalo, Tuukka, Maaløe, Lars

arXiv.org Artificial IntelligenceApr-21-2023

Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. In previous work, the macro F1 score has been calculated sub-optimally, and our correction doubles it. We contribute a revised model comparison using stratified sampling and identical experimental setups, including hyperparameters and decision boundary tuning. We analyze prediction errors to validate and falsify assumptions of previous works. The analysis confirms that all models struggle with rare codes, while long documents only have a negligible impact. Finally, we present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models. We release our code, model parameters, and new MIMIC-III and MIMIC-IV training and evaluation pipelines to accommodate fair future comparisons.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591918

2304.10909

Country:

Europe (1.00)
North America > United States (0.93)
Asia (0.70)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback