Goto

Collaborating Authors

 giunchiglia


Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning

Bontempelli, Andrea, Busso, Matteo, Malcotti, Leonardo Javier, Giunchiglia, Fausto

arXiv.org Artificial Intelligence

Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life--including fitness schedules--requires high-quality annotations to function properly. However, user annotations, whether actively produced or inferred from context (e.g., data from smartphone sensors), are often subject to errors and noise. Previous research on Skeptical Learning ( skel) addressed the issue of noisy labels by comparing offline active annotations with passive data, allowing for an evaluation of annotation accuracy. However, this evaluation did not include confirmation from end-users, the best judges of their own context. In this study, we evaluate skel's performance in real-world conditions with actual users who can refine the input labels based on their current perspectives and needs. The study involves university students using the iLog mobile application on their devices over a period of four weeks. The results highlight the challenges of finding the right balance between user effort and data quality, as well as the potential benefits of using skel, which include reduced annotation effort and improved quality of collected data.


Toward a Better Localization of Princeton WordNet

Freihat, Abed Alhakim

arXiv.org Artificial Intelligence

As Princeton WordNet continues to gain significance as a semantic lexicon in Natural Language Processing, the need for its localization and for ensuring the quality of this process has become increasingly critical. Existing efforts remain limited in both scale and rigor, and there is a notable absence of studies addressing the accuracy of localization or its alignment with the cultural context of Arabic. This paper proposes a structured framework for the localization of Princeton WordNet, detailing the stages and procedures required to achieve high-quality results without compromising cultural authenticity. We further present our experience in applying this framework, reporting outcomes from the localization of 10,000 synsets.


Generation Alpha's coded language makes online bullying hard to detect

New Scientist

Teenagers' language might make online bullying hard to detect Generation Alpha's internet lingo is mutating faster than teachers, parents and AI models can keep up – potentially exposing youngsters to bullying and grooming that trusted adults and AI-based safety systems simply can't see. Manisha Mehta, a 14-year-old student at Warren E Hyde Middle School in Cupertino, California, and Fausto Giunchiglia at the University of Trento, Italy, collated 100 expressions and phrases popular with Generation Alpha – those born between 2010 and 2025 – from popular gaming, social media and video platforms. The pair then asked 24 volunteers aged between 11 and 14, who were Mehta's classmates, to analyse the phrases alongside context-specific screenshots. The volunteers explained whether they understood the phrases, in what context they were being used and if that use carried any potential safety concerns or harmful interpretations. They also asked parents, professional moderators and four AI models – GPT-4, Claude, Gemini and Llama 3 – to do the same.


A Simple Graph Contrastive Learning Framework for Short Text Classification

Liu, Yonghao, Giunchiglia, Fausto, Huang, Lan, Li, Ximing, Feng, Xiaoyue, Guan, Renchu

arXiv.org Artificial Intelligence

Short text classification has gained significant attention in the information age due to its prevalence and real-world applications. Recent advancements in graph learning combined with contrastive learning have shown promising results in addressing the challenges of semantic sparsity and limited labeled data in short text classification. However, existing models have certain limitations. They rely on explicit data augmentation techniques to generate contrastive views, resulting in semantic corruption and noise. Additionally, these models only focus on learning the intrinsic consistency between the generated views, neglecting valuable discriminative information from other potential views. To address these issues, we propose a Simple graph contrastive learning framework for Short Text Classification (SimSTC). Our approach involves performing graph learning on multiple text-related component graphs to obtain multi-view text embeddings. Subsequently, we directly apply contrastive learning on these embeddings. Notably, our method eliminates the need for data augmentation operations to generate contrastive views while still leveraging the benefits of multi-view contrastive learning. Despite its simplicity, our model achieves outstanding performance, surpassing large language models on various datasets.


Temporal Numeric Planning with Patterns

Cardellini, Matteo, Giunchiglia, Enrico

arXiv.org Artificial Intelligence

Differently from results highlight the strong performances of our planner, the classical case, where plans are sequences of instantaneous which achieved the highest coverage (i.e., number of solved actions and variables are Boolean, in these problems problems) in 9 out of 10 domains, while the second-best actions may have a duration, are executed concurrently over planner had the highest coverage in 4 domains. Additionally, time, and can affect Boolean and numeric variables at both compared to the other symbolic planners, our system is able the start and end of their execution. These two extensions to find a valid plan with a lower bound on all the problems.


KAE: A Property-based Method for Knowledge Graph Alignment and Extension

Shi, Daqian, Li, Xiaoyue, Giunchiglia, Fausto

arXiv.org Artificial Intelligence

A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performing in practice or not applicable in some cases. In this paper, we design a machine learning-based framework for KG extension, including an alternative novel property-based alignment approach that allows aligning etypes on the basis of the properties used to define them. The main intuition is that it is properties that intentionally define the etype, and this definition is independent of the specific label used to name an etype, and of the specific hierarchical schema of KGs. Compared with the state-of-the-art, the experimental results show the validity of the KG alignment approach and the superiority of the proposed KG extension framework, both quantitatively and qualitatively.


Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference

Liu, Yonghao, Li, Mengyu, Liang, Di, Li, Ximing, Giunchiglia, Fausto, Huang, Lan, Feng, Xiaoyue, Guan, Renchu

arXiv.org Artificial Intelligence

Natural Language Inference (NLI) is a crucial task in natural language processing that involves determining the relationship between two sentences, typically referred to as the premise and the hypothesis. However, traditional NLI models solely rely on the semantic information inherent in independent sentences and lack relevant situational visual information, which can hinder a complete understanding of the intended meaning of the sentences due to the ambiguity and vagueness of language. To address this challenge, we propose an innovative ScenaFuse adapter that simultaneously integrates large-scale pre-trained linguistic knowledge and relevant visual information for NLI tasks. Specifically, we first design an image-sentence interaction module to incorporate visuals into the attention mechanism of the pre-trained model, allowing the two modalities to interact comprehensively. Furthermore, we introduce an image-sentence fusion module that can adaptively integrate visual information from images and semantic information from sentences. By incorporating relevant visual information and leveraging linguistic knowledge, our approach bridges the gap between language and vision, leading to improved understanding and inference capabilities in NLI tasks. Extensive benchmark experiments demonstrate that our proposed ScenaFuse, a scenario-guided approach, consistently boosts NLI performance.


Layers of technology in pluriversal design. Decolonising language technology with the LiveLanguage initiative

Koch, Gertraud, Bella, Gábor, Helm, Paula, Giunchiglia, Fausto

arXiv.org Artificial Intelligence

Language technology has the potential to facilitate intercultural communication through meaningful translations. However, the current state of language technology is deeply entangled with colonial knowledge due to path dependencies and neo-colonial tendencies in the global governance of artificial intelligence (AI). Language technology is a complex and emerging field that presents challenges for co-design interventions due to enfolding in assemblages of global scale and diverse sites and its knowledge intensity. This paper uses LiveLanguage, a lexical database, a set of services with particular emphasis on modelling language diversity and integrating small and minority languages, as an example to discuss and close the gap from pluriversal design theory to practice. By diversifying the concept of emerging technology, we can better approach language technology in global contexts. The paper presents a model comprising of five layers of technological activity. Each layer consists of specific practices and stakeholders, thus provides distinctive spaces for co-design interventions as mode of inquiry for de-linking, re-thinking and re-building language technology towards pluriversality. In that way, the paper contributes to reflecting the position of co-design in decolonising emergent technologies, and to integrating complex theoretical knowledge towards decoloniality into language technology design.


From Knowledge Representation to Knowledge Organization and Back

Giunchiglia, Fausto, Bagchi, Mayukh

arXiv.org Artificial Intelligence

Knowledge Representation (KR) and facet-analytical Knowledge Organization (KO) have been the two most prominent methodologies of data and knowledge modelling in the Artificial Intelligence community and the Information Science community, respectively. KR boasts of a robust and scalable ecosystem of technologies to support knowledge modelling while, often, underemphasizing the quality of its models (and model-based data). KO, on the other hand, is less technology-driven but has developed a robust framework of guiding principles (canons) for ensuring modelling (and model-based data) quality. This paper elucidates both the KR and facet-analytical KO methodologies in detail and provides a functional mapping between them. Out of the mapping, the paper proposes an integrated KO-enriched KR methodology with all the standard components of a KR methodology plus the guiding canons of modelling quality provided by KO. The practical benefits of the methodological integration has been exemplified through a prominent case study of KR-based image annotation exercise.


Towards a Gateway for Knowledge Graph Schemas Collection, Analysis, and Embedding

Fumagalli, Mattia, Boffo, Marco, Shi, Daqian, Bagchi, Mayukh, Giunchiglia, Fausto

arXiv.org Artificial Intelligence

One of the significant barriers to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning process. Currently, many high-quality catalogs of knowledge graphs, are available. However, their primary goal is the re-usability of these resources, and their interconnection, in the context of the Semantic Web. This paper describes the LiveSchema initiative, namely, a first version of a gateway that has the main scope of leveraging the gold mine of data collected by many existing catalogs collecting relational data like ontologies and knowledge graphs. At the current state, LiveSchema contains - 1000 datasets from 4 main sources and offers some key facilities, which allow to: i) evolving LiveSchema, by aggregating other source catalogs and repositories as input sources; ii) querying all the collected resources; iii) transforming each given dataset into formal concept analysis matrices that enable analysis and visualization services; iv) generating models and tensors from each given dataset.