linked data
Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility
This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs). Leveraging linked data and RDF techniques, we integrate data from multiple sources into a unified model based on DCAT and META-SHARE OWL ontology. Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal. Real user queries from the Corpora Mailing List (CML) were evaluated to assess Linghub capability to satisfy actual user needs. Results indicate that while some limitations persist, many user requests can be successfully addressed. The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization. This initial research underscores the importance of API-based access to LRs, promoting machine usability and data subset extraction for specific purposes, paving the way for more efficient and standardized LR utilization.
Integrating SPARQL and LLMs for Question Answering over Scholarly Data Sources
Fondi, Fomubad Borista, Fidel, Azanzi Jiomekong, Camara, Gaoussou
The Scholarly Hybrid Question Answering over Linked Data (QALD) Challenge at the International Semantic Web Conference (ISWC) 2024 focuses on Question Answering (QA) over diverse scholarly sources: DBLP, SemOpenAlex, and Wikipedia-based texts. This paper describes a methodology that combines SPARQL queries, divide and conquer algorithms, and a pre-trained extractive question answering model. It starts with SPARQL queries to gather data, then applies divide and conquer to manage various question types and sources, and uses the model to handle personal author questions. The approach, evaluated with Exact Match and F-score metrics, shows promise for improving QA accuracy and efficiency in scholarly contexts. Keywords: Scholarly Question Answering, Large Language Models, Divide and conquer.
A semantic web approach to uplift decentralized household energy data
Wu, Jiantao, Orlandi, Fabrizio, AlSkaif, Tarek, O'Sullivan, Declan, Dev, Soumyabrata
Among a variety of other considerations, energy efficiency is a major focus for the Union's ultimate decarbonization. This makes high energy efficiency a critical priority for all energy sectors, particularly the residential sector [2], which occupies more than a quarter of the Union's total final energy consumption. Energy decentralization has emerged as one of the most popular contemporary research topic in this domain as a mean for increasing energy efficiency [3]. With the growing usage of Information and Communication Technologies (ICT) in the Internet of Things (IoT) sector, data on household energy consumption and production (HECP) may now be generated in a decentralized manner, for example, from an electric vehicle, a heat pump, or home appliances. Due to the range and granularity of data-generating devices, a new generation of smart household energy systems is geared toward decentralization and has the potential to considerably assist in the transition to a sustainable energy future [4, 5]. On the other hand, evaluating household energy data is getting increasingly difficult as a result of various smart devices interacting and forming a complex energy flow data network [6, 7]. Decentralized energy systems are often paired with research into data-driven technologies (e.g. machine learning) for opti-2 mizing the systems based on the massive ocean of incoming data in order to manage the inherent risk associated with energy usage's intermittent and unpredictable nature and achieve energy sustainability, including cost reduction, emission reduction, and energy efficiency. However, most of those technologies are developed for project-specific decentralized data (i.e.
Finding Experts in Social Media Data using a Hybrid Approach
Several approaches to the problem of expert finding have emerged in computer science research. In this work, three of these approaches - content analysis, social graph analysis and the use of Semantic Web technologies are examined. An integrated set of system requirements is then developed that uses all three approaches in one hybrid approach. To show the practicality of this hybrid approach, a usable prototype expert finding system called ExpertQuest is developed using a modern functional programming language (Clojure) to query social media data and Linked Data. This system is evaluated and discussed. Finally, a discussion and conclusions are presented which describe the benefits and shortcomings of the hybrid approach and the technologies used in this work.
Towards Natural Language Question Answering over Earth Observation Linked Data using Attention-based Neural Machine Translation
Potnis, Abhishek V., Shinde, Rajat C., Durbha, Surya S.
With an increase in Geospatial Linked Open Data being adopted and published over the web, there is a need to develop intuitive interfaces and systems for seamless and efficient exploratory analysis of such rich heterogeneous multi-modal datasets. This work is geared towards improving the exploration process of Earth Observation (EO) Linked Data by developing a natural language interface to facilitate querying. Questions asked over Earth Observation Linked Data have an inherent spatio-temporal dimension and can be represented using GeoSPARQL. This paper seeks to study and analyze the use of RNN-based neural machine translation with attention for transforming natural language questions into GeoSPARQL queries. Specifically, it aims to assess the feasibility of a neural approach for identifying and mapping spatial predicates in natural language to GeoSPARQL's topology vocabulary extension including - Egenhofer and RCC8 relations. The queries can then be executed over a triple store to yield answers for the natural language questions. A dataset consisting of mappings from natural language questions to GeoSPARQL queries over the Corine Land Cover(CLC) Linked Data has been created to train and validate the deep neural network. From our experiments, it is evident that neural machine translation with attention is a promising approach for the task of translating spatial predicates in natural language questions to GeoSPARQL queries.
Ontologies in CLARIAH: Towards Interoperability in History, Language and Media
Meroño-Peñuela, Albert, de Boer, Victor, van Erp, Marieke, Melder, Willem, Mourits, Rick, Rijpma, Auke, Schalk, Ruben, Zijdeman, Richard
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared vocabularies; and Reusable, through adequate licensing and permissions. Integrating data from diverse humanities domains is not trivial, research questions such as "was economic wealth equally distributed in the 18th century?", or "what are narratives constructed around disruptive media events?") and preparation phases (e.g. data collection, knowledge organisation, cleaning) of scholars need to be taken into account. In this chapter, we describe the ontologies and tools developed and integrated in the Dutch national project CLARIAH to address these issues across datasets from three fundamental domains or "pillars" of the humanities (linguistics, social and economic history, and media studies) that have paradigmatic data representations (textual corpora, structured data, and multimedia). We summarise the lessons learnt from using such ontologies and tools in these domains from a generalisation and reusability perspective.
Graphs in the 2020s: Databases, Platforms and The Evolution of Knowledge
Graphs, and knowledge graphs, are key concepts and technologies for the 2020s. What will they look like, and what will they enable going forward? We have been keeping track of the evolution of graphs since the early 2000s, and publishing the Year of the Graph newsletter since 2018. Graphs have numerous applications that span analytics, AI, and knowledge management. All of the above are built on a common substrate: data.
From Textual Information Sources to Linked Data in the Agatha Project
Quaresma, Paulo, Nogueira, Vitor Beires, Raiyani, Kashyap, Bayot, Roy, Gonçalves, Teresa
Automatic reasoning about textual information is a challenging task in modern Natural Language Processing (NLP) systems. In this work we describe our proposal for representing and reasoning about Portuguese documents by means of Linked Data like ontologies and thesauri. Our approach resorts to a specialized pipeline of natural language processing (part-of-speech tagger, named entity recognition, semantic role labeling) to populate an ontology for the domain of criminal investigations. The provided architecture and ontology are language independent. Although some of the NLP modules are language dependent, they can be built using adequate AI methodologies.
Connected Data London 7wData
Whatever makes you tick, you will find it in Connected Data London 2019. Knowledge Graphs, Machine Learning and AI, Linked Data and Semantic Technology and Graph Databases are redefining how data works. Data is redefining how everything works. And Connected Data London is the go-to event for the latest developments in these key technologies. We are picking up from where we left off in 2018, connecting technologies, data, and people.
Virtual Representations for Iterative IoT Deployment
Bader, Sebastian R., Maleshkova, Maria
A central vision of the Internet of Things is the representation of the physical world in a consistent virtual environment. Especially in the context of smart factories the connection of the different, heterogeneous production modules through a digital shop floor promises faster conversion rates, data-driven maintenance or automated machine configurations for use cases, which have not been known at design time. Nevertheless, these scenarios demand IoT representations of all participating machines and components, which requires high installation efforts and hardware adjustments. We propose an incremental process for bringing the shop floor closer to the IoT vision. Currently the majority of systems, components or parts are not yet connected with the internet and might not even provide the possibility to be technically equipped with sensors. However, those could be essential parts for a realistic digital shop floor representation. We, therefore, propose Virtual Representations, which are capable of independently calculating a physical object's condition by dynamically collecting and interpreting already available data through RESTful Web APIs. The internal logic of such Virtual Representations are further adjustable at runtime, since changes to its respective physical object, its environment or updates to the resource itself should not cause any downtime.