AITopics | Magnini, Bernardo

Plotting

Magnini, Bernardo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Testa, Davide, Bonetta, Giovanni, Bernardi, Raffaella, Bondielli, Alessandro, Lenci, Alessandro, Miaschi, Alessio, Passaro, Lucia, Magnini, Bernardo

arXiv.org Artificial IntelligenceFeb-24-2025

We introduce MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos. MAIA differs from other available video benchmarks for its design, its reasoning categories, the metric it uses and the language and culture of the videos. It evaluates Vision Language Models (VLMs) on two aligned tasks: a visual statement verification task, and an open-ended visual question-answering task, both on the same set of video-related questions. It considers twelve reasoning categories that aim to disentangle language and vision relations by highlight when one of two alone encodes sufficient information to solve the tasks, when they are both needed and when the full richness of the short video is essential instead of just a part of it. Thanks to its carefully taught design, it evaluates VLMs' consistency and visually grounded natural language comprehension and generation simultaneously through an aggregated metric. Last but not least, the video collection has been carefully selected to reflect the Italian culture and the language data are produced by native-speakers.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.16989

Country:

Europe > Italy (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Evalita-LLM: Benchmarking Large Language Models on Italian

Magnini, Bernardo, Zanoli, Roberto, Resta, Michele, Cimmino, Martin, Albano, Paolo, Madeddu, Marco, Patti, Viviana

arXiv.org Artificial IntelligenceFeb-4-2025

We describe Evalita-LLM, a new benchmark designed to evaluate Large Language Models (LLMs) on Italian tasks. The distinguishing and innovative features of Evalita-LLM are the following: (i) all tasks are native Italian, avoiding issues of translating from Italian and potential cultural biases; (ii) in addition to well established multiple-choice tasks, the benchmark includes generative tasks, enabling more natural interaction with LLMs; (iii) all tasks are evaluated against multiple prompts, this way mitigating the model sensitivity to specific prompts and allowing a fairer and objective evaluation. We propose an iterative methodology, where candidate tasks and candidate prompts are validated against a set of LLMs used for development. We report experimental results from the benchmark's development phase, and provide performance statistics for several state-of-the-art LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.02289

Country: Europe > Italy > Piedmont > Turin Province > Turin (0.14)

Genre:

Overview (0.92)
Research Report (0.82)

Industry:

Materials (0.67)
Leisure & Entertainment (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Evaluating Task-Oriented Dialogue Consistency through Constraint Satisfaction

Labruna, Tiziano, Magnini, Bernardo

arXiv.org Artificial IntelligenceJul-16-2024

Task-oriented dialogues must maintain consistency both within the dialogue itself, ensuring logical coherence across turns, and with the conversational domain, accurately reflecting external knowledge. We propose to conceptualize dialogue consistency as a Constraint Satisfaction Problem (CSP), wherein variables represent segments of the dialogue referencing the conversational domain, and constraints among variables reflect dialogue properties, including linguistic, conversational, and domain-based aspects. To demonstrate the feasibility of the approach, we utilize a CSP solver to detect inconsistencies in dialogues re-lexicalized by an LLM. Our findings indicate that: (i) CSP is effective to detect dialogue inconsistencies; and (ii) consistent dialogue re-lexicalization is challenging for state-of-the-art LLMs, achieving only a 0.15 accuracy rate when compared to a CSP solver. Furthermore, through an ablation study, we reveal that constraints derived from domain knowledge pose the greatest difficulty in being respected. We argue that CSP captures core properties of dialogue consistency that have been poorly considered by approaches based on component pipelines.

artificial intelligence, constraint-based reasoning, task-oriented dialogue consistency, (1 more...)

arXiv.org Artificial Intelligence

2407.11857

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback

Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

García-Ferrero, Iker, Agerri, Rodrigo, Salazar, Aitziber Atutxa, Cabrio, Elena, de la Iglesia, Iker, Lavelli, Alberto, Magnini, Bernardo, Molinet, Benjamin, Ramirez-Romero, Johana, Rigau, German, Villa-Gonzalez, Jose Maria, Villata, Serena, Zaninello, Andrea

arXiv.org Artificial IntelligenceApr-11-2024

Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.07613

Country:

Europe > Spain > Andalusia (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

Labruna, Tiziano, Brenna, Sofia, Zaninello, Andrea, Magnini, Bernardo

arXiv.org Artificial IntelligenceMay-23-2023

Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where such data is scarce, complex to gather, expensive, or even sensitive. In this paper, we explore the potential of these models to generate and annotate goal-oriented dialogues, and conduct an in-depth analysis to evaluate their quality. Our experiments employ ChatGPT, and encompass three categories of goal-oriented dialogues (task-oriented, collaborative, and explanatory), two generation modes (interactive and one-shot), and two languages (English and Italian). Based on extensive human-based evaluations, we demonstrate that the quality of generated dialogues and annotations is on par with those generated by humans.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14556

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Questionnaire & Opinion Survey (0.95)
Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback