giunchiglia
Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning
Bontempelli, Andrea, Busso, Matteo, Malcotti, Leonardo Javier, Giunchiglia, Fausto
Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life--including fitness schedules--requires high-quality annotations to function properly. However, user annotations, whether actively produced or inferred from context (e.g., data from smartphone sensors), are often subject to errors and noise. Previous research on Skeptical Learning ( skel) addressed the issue of noisy labels by comparing offline active annotations with passive data, allowing for an evaluation of annotation accuracy. However, this evaluation did not include confirmation from end-users, the best judges of their own context. In this study, we evaluate skel's performance in real-world conditions with actual users who can refine the input labels based on their current perspectives and needs. The study involves university students using the iLog mobile application on their devices over a period of four weeks. The results highlight the challenges of finding the right balance between user effort and data quality, as well as the potential benefits of using skel, which include reduced annotation effort and improved quality of collected data.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- (2 more...)
- Transportation > Passenger (0.68)
- Education > Educational Setting > Higher Education (0.34)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Quality > Data Cleaning (0.40)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)
Toward a Better Localization of Princeton WordNet
As Princeton WordNet continues to gain significance as a semantic lexicon in Natural Language Processing, the need for its localization and for ensuring the quality of this process has become increasingly critical. Existing efforts remain limited in both scale and rigor, and there is a notable absence of studies addressing the accuracy of localization or its alignment with the cultural context of Arabic. This paper proposes a structured framework for the localization of Princeton WordNet, detailing the stages and procedures required to achieve high-quality results without compromising cultural authenticity. We further present our experience in applying this framework, reporting outcomes from the localization of 10,000 synsets.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
- (3 more...)
Generation Alpha's coded language makes online bullying hard to detect
Teenagers' language might make online bullying hard to detect Generation Alpha's internet lingo is mutating faster than teachers, parents and AI models can keep up – potentially exposing youngsters to bullying and grooming that trusted adults and AI-based safety systems simply can't see. Manisha Mehta, a 14-year-old student at Warren E Hyde Middle School in Cupertino, California, and Fausto Giunchiglia at the University of Trento, Italy, collated 100 expressions and phrases popular with Generation Alpha – those born between 2010 and 2025 – from popular gaming, social media and video platforms. The pair then asked 24 volunteers aged between 11 and 14, who were Mehta's classmates, to analyse the phrases alongside context-specific screenshots. The volunteers explained whether they understood the phrases, in what context they were being used and if that use carried any potential safety concerns or harmful interpretations. They also asked parents, professional moderators and four AI models – GPT-4, Claude, Gemini and Llama 3 – to do the same.
- North America > United States > California > Santa Clara County > Cupertino (0.26)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.26)
- Europe > Greece > Attica > Athens (0.06)
A Simple Graph Contrastive Learning Framework for Short Text Classification
Liu, Yonghao, Giunchiglia, Fausto, Huang, Lan, Li, Ximing, Feng, Xiaoyue, Guan, Renchu
Short text classification has gained significant attention in the information age due to its prevalence and real-world applications. Recent advancements in graph learning combined with contrastive learning have shown promising results in addressing the challenges of semantic sparsity and limited labeled data in short text classification. However, existing models have certain limitations. They rely on explicit data augmentation techniques to generate contrastive views, resulting in semantic corruption and noise. Additionally, these models only focus on learning the intrinsic consistency between the generated views, neglecting valuable discriminative information from other potential views. To address these issues, we propose a Simple graph contrastive learning framework for Short Text Classification (SimSTC). Our approach involves performing graph learning on multiple text-related component graphs to obtain multi-view text embeddings. Subsequently, we directly apply contrastive learning on these embeddings. Notably, our method eliminates the need for data augmentation operations to generate contrastive views while still leveraging the benefits of multi-view contrastive learning. Despite its simplicity, our model achieves outstanding performance, surpassing large language models on various datasets.
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Temporal Numeric Planning with Patterns
Cardellini, Matteo, Giunchiglia, Enrico
Differently from results highlight the strong performances of our planner, the classical case, where plans are sequences of instantaneous which achieved the highest coverage (i.e., number of solved actions and variables are Boolean, in these problems problems) in 9 out of 10 domains, while the second-best actions may have a duration, are executed concurrently over planner had the highest coverage in 4 domains. Additionally, time, and can affect Boolean and numeric variables at both compared to the other symbolic planners, our system is able the start and end of their execution. These two extensions to find a valid plan with a lower bound on all the problems.
- North America > United States > Oklahoma > Payne County > Cushing (0.05)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (8 more...)
KAE: A Property-based Method for Knowledge Graph Alignment and Extension
Shi, Daqian, Li, Xiaoyue, Giunchiglia, Fausto
A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performing in practice or not applicable in some cases. In this paper, we design a machine learning-based framework for KG extension, including an alternative novel property-based alignment approach that allows aligning etypes on the basis of the properties used to define them. The main intuition is that it is properties that intentionally define the etype, and this definition is independent of the specific label used to name an etype, and of the specific hierarchical schema of KGs. Compared with the state-of-the-art, the experimental results show the validity of the KG alignment approach and the superiority of the proposed KG extension framework, both quantitatively and qualitatively.
- North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
- Information Technology > Communications > Web > Semantic Web (0.68)
Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference
Liu, Yonghao, Li, Mengyu, Liang, Di, Li, Ximing, Giunchiglia, Fausto, Huang, Lan, Feng, Xiaoyue, Guan, Renchu
Natural Language Inference (NLI) is a crucial task in natural language processing that involves determining the relationship between two sentences, typically referred to as the premise and the hypothesis. However, traditional NLI models solely rely on the semantic information inherent in independent sentences and lack relevant situational visual information, which can hinder a complete understanding of the intended meaning of the sentences due to the ambiguity and vagueness of language. To address this challenge, we propose an innovative ScenaFuse adapter that simultaneously integrates large-scale pre-trained linguistic knowledge and relevant visual information for NLI tasks. Specifically, we first design an image-sentence interaction module to incorporate visuals into the attention mechanism of the pre-trained model, allowing the two modalities to interact comprehensively. Furthermore, we introduce an image-sentence fusion module that can adaptively integrate visual information from images and semantic information from sentences. By incorporating relevant visual information and leveraging linguistic knowledge, our approach bridges the gap between language and vision, leading to improved understanding and inference capabilities in NLI tasks. Extensive benchmark experiments demonstrate that our proposed ScenaFuse, a scenario-guided approach, consistently boosts NLI performance.
Layers of technology in pluriversal design. Decolonising language technology with the LiveLanguage initiative
Koch, Gertraud, Bella, Gábor, Helm, Paula, Giunchiglia, Fausto
Language technology has the potential to facilitate intercultural communication through meaningful translations. However, the current state of language technology is deeply entangled with colonial knowledge due to path dependencies and neo-colonial tendencies in the global governance of artificial intelligence (AI). Language technology is a complex and emerging field that presents challenges for co-design interventions due to enfolding in assemblages of global scale and diverse sites and its knowledge intensity. This paper uses LiveLanguage, a lexical database, a set of services with particular emphasis on modelling language diversity and integrating small and minority languages, as an example to discuss and close the gap from pluriversal design theory to practice. By diversifying the concept of emerging technology, we can better approach language technology in global contexts. The paper presents a model comprising of five layers of technological activity. Each layer consists of specific practices and stakeholders, thus provides distinctive spaces for co-design interventions as mode of inquiry for de-linking, re-thinking and re-building language technology towards pluriversality. In that way, the paper contributes to reflecting the position of co-design in decolonising emergent technologies, and to integrating complex theoretical knowledge towards decoloniality into language technology design.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > United Kingdom > UK North Sea (0.05)
- (19 more...)
- Research Report (0.64)
- Instructional Material (0.46)
From Knowledge Representation to Knowledge Organization and Back
Giunchiglia, Fausto, Bagchi, Mayukh
Knowledge Representation (KR) and facet-analytical Knowledge Organization (KO) have been the two most prominent methodologies of data and knowledge modelling in the Artificial Intelligence community and the Information Science community, respectively. KR boasts of a robust and scalable ecosystem of technologies to support knowledge modelling while, often, underemphasizing the quality of its models (and model-based data). KO, on the other hand, is less technology-driven but has developed a robust framework of guiding principles (canons) for ensuring modelling (and model-based data) quality. This paper elucidates both the KR and facet-analytical KO methodologies in detail and provides a functional mapping between them. Out of the mapping, the paper proposes an integrated KO-enriched KR methodology with all the standard components of a KR methodology plus the guiding canons of modelling quality provided by KO. The practical benefits of the methodological integration has been exemplified through a prominent case study of KR-based image annotation exercise.
- North America > United States > New York (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Quebec > Estrie Region > Sherbrooke (0.04)
- (5 more...)
Towards a Gateway for Knowledge Graph Schemas Collection, Analysis, and Embedding
Fumagalli, Mattia, Boffo, Marco, Shi, Daqian, Bagchi, Mayukh, Giunchiglia, Fausto
One of the significant barriers to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning process. Currently, many high-quality catalogs of knowledge graphs, are available. However, their primary goal is the re-usability of these resources, and their interconnection, in the context of the Semantic Web. This paper describes the LiveSchema initiative, namely, a first version of a gateway that has the main scope of leveraging the gold mine of data collected by many existing catalogs collecting relational data like ontologies and knowledge graphs. At the current state, LiveSchema contains - 1000 datasets from 4 main sources and offers some key facilities, which allow to: i) evolving LiveSchema, by aggregating other source catalogs and repositories as input sources; ii) querying all the collected resources; iii) transforming each given dataset into formal concept analysis matrices that enable analysis and visualization services; iv) generating models and tensors from each given dataset.
- North America > United States (0.04)
- North America > Canada > Quebec > Estrie Region > Sherbrooke (0.04)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.66)