Goto

Collaborating Authors

 Ontologies


Structural Quality Metrics to Evaluate Knowledge Graphs

arXiv.org Artificial Intelligence

This work presents six structural quality metrics that can measure the quality of knowledge graphs and analyzes five cross-domain knowledge graphs on the web (Wikidata, DBpedia, YAGO, Google Knowledge Graph, Freebase) as well as 'Raftel', Naver's integrated knowledge graph. The 'Good Knowledge Graph' should define detailed classes and properties in its ontology so that knowledge in the real world can be expressed abundantly. Also, instances and RDF triples should use the classes and properties actively. Therefore, we tried to examine the internal quality of knowledge graphs numerically by focusing on the structure of the ontology, which is the schema of knowledge graphs, and the degree of use thereof. As a result of the analysis, it was possible to find the characteristics of a knowledge graph that could not be known only by scale-related indicators such as the number of classes and properties.


Ontomathedu Ontology Enrichment Method

arXiv.org Artificial Intelligence

Nowadays, distance learning technologies have become very popular. The recent pandemic has had a particularly strong impact on the development of distance education technologies. Kazan Federal University has a distance learning system based on LMS Moodle. This article describes the structure of the OntoMathEdu ecosystem aimed at improving the process of teaching school mathematics courses, and also provides a method for improving the OntoMathEdu ontology structure based on identifying new connections between contextually related concepts.


First-Order Rewritability and Complexity of Two-Dimensional Temporal Ontology-Mediated Queries

Journal of Artificial Intelligence Research

Aiming at ontology-based data access to temporal data, we design two-dimensional temporal ontology and query languages by combining logics from the (extended) DL-Lite family with linear temporal logic LTL over discrete time (Z,<). Our main concern is first-order rewritability of ontology-mediated queries (OMQs) that consist of a 2D ontology and a positive temporal instance query. Our target languages for FO-rewritings are two-sorted FO(<)โ€”first-order logic with sorts for time instants ordered by the built-in precedence relation < and for the domain of individualsโ€”its extension FO(<,โ‰ก) with the standard congruence predicatesย tย โ‰ก 0 (modย n), for any fixedย nย > 1, and FO(RPR) that admits relational primitive recursion. In terms of circuit complexity, FO(<,โ‰ก)- and FO(RPR)-rewritability guarantee answering OMQs in uniform AC0ย and NC1, respectively. We proceed in three steps. First, we define a hierarchy of 2D DL-Lite/LTL ontology languages and investigate the FO-rewritability of OMQs with atomic queries by constructing projections onto 1D LTL OMQs and employing recent results on the FO-rewritability of propositional LTL OMQs. As the projections involve deciding consistency of ontologies and data, we also consider the consistency problem for our languages. While the undecidability of consistency for 2D ontology languages with expressive Boolean role inclusions might be expected, we also show that, rather surprisingly, the restriction to Krom and Horn role inclusions leads to decidability (and ExpSpace-completeness), even if one admits full Booleans on concepts. As a final step, we lift some of the rewritability results for atomic OMQs to OMQs with expressive positive temporal instance queries. The lifting results are based on an in-depth study of the canonical models and only concern Horn ontologies.



Ontology-aware Learning and Evaluation for Audio Tagging

arXiv.org Artificial Intelligence

This study defines a new evaluation metric for audio tagging tasks to overcome the limitation of the conventional mean average precision (mAP) metric, which treats different kinds of sound as independent classes without considering their relations. Also, due to the ambiguities in sound labeling, the labels in the training and evaluation set are not guaranteed to be accurate and exhaustive, which poses challenges for robust evaluation with mAP. The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation. Specifically, we reweight the false positive events in the model prediction based on the ontology graph distance to the target classes. The OmAP measure also provides more insights into model performance by evaluations with different coarse-grained levels in the ontology graph. We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP. To further verify the importance of utilizing the ontology information, we also propose a novel loss function (OBCE) that reweights binary cross entropy (BCE) loss based on the ontology distance. Our experiment shows that OBCE can improve both mAP and OmAP metrics on the AudioSet tagging task.


OLGA : An Ontology and LSTM-based approach for generating Arithmetic Word Problems (AWPs) of transfer type

arXiv.org Artificial Intelligence

Machine generation of Arithmetic Word Problems (AWPs) is challenging as they express quantities and mathematical relationships and need to be consistent. ML-solvers require a large annotated training set of consistent problems with language variations. Exploiting domain-knowledge is needed for consistency checking whereas LSTM-based approaches are good for producing text with language variations. Combining these we propose a system, OLGA, to generate consistent word problems of TC (Transfer-Case) type, involving object transfers among agents. Though we provide a dataset of consistent 2-agent TC-problems for training, only about 36% of the outputs of an LSTM-based generator are found consistent. We use an extension of TC-Ontology, proposed by us previously, to determine the consistency of problems. Among the remaining 64%, about 40% have minor errors which we repair using the same ontology. To check consistency and for the repair process, we construct an instance-specific representation (ABox) of an auto-generated problem. We use a sentence classifier and BERT models for this task. The training set for these LMs is problem-texts where sentence-parts are annotated with ontology class-names. As three-agent problems are longer, the percentage of consistent problems generated by an LSTM-based approach drops further. Hence, we propose an ontology-based method that extends consistent 2-agent problems into consistent 3-agent problems. Overall, our approach generates a large number of consistent TC-type AWPs involving 2 or 3 agents. As ABox has all the information of a problem, any annotations can also be generated. Adopting the proposed approach to generate other types of AWPs is interesting future work.


Big Earth Data and Machine Learning for Sustainable and Resilient Agriculture

arXiv.org Artificial Intelligence

Big streams of Earth images from satellites or other platforms (e.g., drones and mobile phones) are becoming increasingly available at low or no cost and with enhanced spatial and temporal resolution. This thesis recognizes the unprecedented opportunities offered by the high quality and open access Earth observation data of our times and introduces novel machine learning and big data methods to properly exploit them towards developing applications for sustainable and resilient agriculture. The thesis addresses three distinct thematic areas, i.e., the monitoring of the Common Agricultural Policy (CAP), the monitoring of food security and applications for smart and resilient agriculture. The methodological innovations of the developments related to the three thematic areas address the following issues: i) the processing of big Earth Observation (EO) data, ii) the scarcity of annotated data for machine learning model training and iii) the gap between machine learning outputs and actionable advice. This thesis demonstrated how big data technologies such as data cubes, distributed learning, linked open data and semantic enrichment can be used to exploit the data deluge and extract knowledge to address real user needs. Furthermore, this thesis argues for the importance of semi-supervised and unsupervised machine learning models that circumvent the ever-present challenge of scarce annotations and thus allow for model generalization in space and time. Specifically, it is shown how merely few ground truth data are needed to generate high quality crop type maps and crop phenology estimations. Finally, this thesis argues there is considerable distance in value between model inferences and decision making in real-world scenarios and thereby showcases the power of causal and interpretable machine learning in bridging this gap.


OPTION: OPTImization Algorithm Benchmarking ONtology

arXiv.org Artificial Intelligence

Many optimization algorithm benchmarking platforms allow users to share their experimental data to promote reproducible and reusable research. However, different platforms use different data models and formats, which drastically complicates the identification of relevant datasets, their interpretation, and their interoperability. Therefore, a semantically rich, ontology-based, machine-readable data model that can be used by different platforms is highly desirable. In this paper, we report on the development of such an ontology, which we call OPTION (OPTImization algorithm benchmarking ONtology). Our ontology provides the vocabulary needed for semantic annotation of the core entities involved in the benchmarking process, such as algorithms, problems, and evaluation measures. It also provides means for automatic data integration, improved interoperability, and powerful querying capabilities, thereby increasing the value of the benchmarking data. We demonstrate the utility of OPTION, by annotating and querying a corpus of benchmark performance data from the BBOB collection of the COCO framework and from the Yet Another Black-Box Optimization Benchmark (YABBOB) family of the Nevergrad environment. In addition, we integrate features of the BBOB functional performance landscape into the OPTION knowledge base using publicly available datasets with exploratory landscape analysis. Finally, we integrate the OPTION knowledge base into the IOHprofiler environment and provide users with the ability to perform meta-analysis of performance data.


Natural Language Processing for Systems Engineering: Automatic Generation of Systems Modelling Language Diagrams

arXiv.org Artificial Intelligence

The design of complex engineering systems is an often long and articulated process that highly relies on engineers' expertise and professional judgment. As such, the typical pitfalls of activities involving the human factor often manifest themselves in terms of lack of completeness or exhaustiveness of the analysis, inconsistencies across design choices or documentation, as well as an implicit degree of subjectivity. An approach is proposed to assist systems engineers in the automatic generation of systems diagrams from unstructured natural language text. Natural Language Processing (NLP) techniques are used to extract entities and their relationships from textual resources (e.g., specifications, manuals, technical reports, maintenance reports) available within an organisation, and convert them into Systems Modelling Language (SysML) diagrams, with particular focus on structure and requirement diagrams. The intention is to provide the users with a more standardised, comprehensive and automated starting point onto which subsequently refine and adapt the diagrams according to their needs. The proposed approach is flexible and open-domain. It consists of six steps which leverage open-access tools, and it leads to an automatic generation of SysML diagrams without intermediate modelling requirement, but through the specification of a set of parameters by the user. The applicability and benefits of the proposed approach are shown through six case studies having different textual sources as inputs, and benchmarked against manually defined diagram elements.


Toward a Flexible Metadata Pipeline for Fish Specimen Images

arXiv.org Artificial Intelligence

Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute. An RDF graph prototype pipeline is presented, followed by a discussion of research implications and conclusion summarizing the results.