AITopics | Poesio, Massimo

Collaborating Authors

Poesio, Massimo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding The Effect Of Temperature On Alignment With Human Opinions

Pavlovic, Maja, Poesio, Massimo

arXiv.org Artificial IntelligenceNov-15-2024

With the increasing capabilities of LLMs, recent studies focus on understanding whose opinions are represented by them and how to effectively extract aligned opinion distributions. We conducted an empirical analysis of three straightforward methods for obtaining distributions and evaluated the results across a variety of metrics. Our findings suggest that sampling and log-probability approaches with simple parameter adjustments can return better aligned outputs in subjective tasks compared to direct prompting. Yet, assuming models reflect human opinions may be limiting, highlighting the need for further research on how human subjectivity affects model uncertainty.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.1008

Country:

Europe > United Kingdom (0.31)
North America > United States (0.28)
Europe > Middle East > Malta (0.15)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

A LLM Benchmark based on the Minecraft Builder Dialog Agent Task

Madge, Chris, Poesio, Massimo

arXiv.org Artificial IntelligenceJul-17-2024

In this work we proposing adapting the Minecraft builder task into an LLM benchmark suitable for evaluating LLM ability in spatially orientated tasks, and informing builder agent design. Previous works have proposed corpora with varying complex structures, and human written instructions. We instead attempt to provide a comprehensive synthetic benchmark for testing builder agents over a series of distinct tasks that comprise of common building operations. We believe this approach allows us to probe specific strengths and weaknesses of different agents, and test the ability of LLMs in the challenging area of spatial reasoning and vector based math.

benchmark, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2407.12734

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation

Pavlovic, Maja, Poesio, Massimo

arXiv.org Artificial IntelligenceMay-2-2024

Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits, there exist considerable limitations, such as representativeness, bias, sensitivity to prompt variations and English language preference. Leveraging insights from these studies, our empirical analysis further examines the alignment between human and GPT-generated opinion distributions across four subjective datasets. In contrast to the studies examining representation, our methodology directly obtains the opinion distribution from GPT. Our analysis thereby supports the minority of studies that are considering diverse perspectives when evaluating data annotation tasks and highlights the need for further research in this direction.

annotator, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2405.01299

Country:

North America > United States (0.93)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.84)

Industry: Government > Regional Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Integrating knowledge bases to improve coreference and bridging resolution for the chemical domain

Lu, Pengcheng, Poesio, Massimo

arXiv.org Artificial IntelligenceApr-16-2024

Resolving coreference and bridging relations in chemical patents is important for better understanding the precise chemical process, where chemical domain knowledge is very critical. We proposed an approach incorporating external knowledge into a multi-task learning model for both coreference and bridging resolution in the chemical domain. The results show that integrating external knowledge can benefit both chemical coreference and bridging resolution.

artificial intelligence, machine learning, resolution, (20 more...)

arXiv.org Artificial Intelligence

2404.10696

Country:

South America (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.43)

Add feedback

Extending Activation Steering to Broad Skills and Multiple Behaviours

van der Weij, Teun, Poesio, Massimo, Schoots, Nandi

arXiv.org Artificial IntelligenceMar-8-2024

Current large language models have dangerous capabilities, which are likely to become more problematic in the future. Activation steering techniques can be used to reduce risks from these capabilities. In this paper, we investigate the efficacy of activation steering for broad skills and multiple behaviours. First, by comparing the effects of reducing performance on general coding ability and Python-specific ability, we find that steering broader skills is competitive to steering narrower skills. Second, we steer models to become more or less myopic and wealth-seeking, among other behaviours. In our experiments, combining steering vectors for multiple different behaviours into one steering vector is largely unsuccessful. On the other hand, injecting individual steering vectors at different places in a model simultaneously is promising.

large language model, machine learning, natural language, (11 more...)

arXiv.org Artificial Intelligence

2403.05767

Genre: Research Report > New Finding (1.00)

Industry: Government (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Add feedback

Large Language Models as Minecraft Agents

Madge, Chris, Poesio, Massimo

arXiv.org Artificial IntelligenceFeb-13-2024

In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous works.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.08392

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

Leonardelli, Elisa, Uma, Alexandra, Abercrombie, Gavin, Almanea, Dina, Basile, Valerio, Fornaciari, Tommaso, Plank, Barbara, Rieser, Verena, Poesio, Massimo

arXiv.org Artificial IntelligenceApr-28-2023

NLP datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the NLP community has come to realize that the approach of 'reconciling' these different subjective interpretations is inappropriate. Many NLP researchers have therefore concluded that rather than eliminating disagreements from annotated corpora, we should preserve them-indeed, some argue that corpora should aim to preserve all annotator judgments. But this approach to corpus creation for NLP has not yet been widely accepted. The objective of the LeWiDi series of shared tasks is to promote this approach to developing NLP models by providing a unified framework for training and evaluating with such datasets. We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on NLP, instead of both NLP and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements-as training with aggregated labels for subjective NLP tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation. This second edition of LeWiDi attracted a wide array of participants resulting in 13 shared task submission papers.

artificial intelligence, dataset, natural language, (13 more...)

arXiv.org Artificial Intelligence

2304.14803

Country:

Europe (1.00)
North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry:

Government > Regional Government (0.68)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Data Augmentation Methods for Anaphoric Zero Pronouns

Aloraini, Abdulrahman, Poesio, Massimo

arXiv.org Artificial IntelligenceSep-20-2021

In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and many others, unrealized (null) arguments in certain syntactic positions can refer to a previously introduced entity, and are thus called anaphoric zero pronouns. The existing resources for studying anaphoric zero pronoun interpretation are however still limited. In this paper, we use five data augmentation methods to generate and detect anaphoric zero pronouns automatically. We use the augmented data as additional training materials for two anaphoric zero pronoun systems for Arabic. Our experimental results show that data augmentation improves the performance of the two systems, surpassing the state-of-the-art results.

artificial intelligence, natural language, resolution, (16 more...)

arXiv.org Artificial Intelligence

2109.09825

Country:

Europe > United Kingdom (0.14)
Europe > Finland (0.14)
Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)

Add feedback