Goto

Collaborating Authors

 Ontologies


SANOM Results for OAEI 2019

arXiv.org Artificial Intelligence

Simulated annealing-based ontology matching (SANOM) participates for the second time at the ontology alignment evaluation initiative (OAEI) 2019. This paper contains the configuration of SANOM and its results on the anatomy and conference tracks. In comparison to the OAEI 2017, SANOM has improved significantly, and its results are competitive with the state-of-the-art systems. In particular, SANOM has the highest recall rate among the participated systems in the conference track, and is competitive with AML, the best performing system, in terms of F-measure. SANOM is also competitive with LogMap on the anatomy track, which is the best performing system in this track with no usage of particular biomedical background knowledge. SANOM has been adapted to the HOBBIT platfrom and is now available for the registered users.


Service mining for Internet of Things

arXiv.org Artificial Intelligence

A service mining framework is proposed that enables discovering interesting relationships in Internet of Things services bottom-up. The service relationships are modeled based on spatial-temporal aspects, environment, people, and operation. An ontology-based service model is proposed to describe services. We present a set of metrics to evaluate the interestingness of discovered service relationships. Analytical and simulation results are presented to show the effectiveness of the proposed evaluation measures.


An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web

arXiv.org Artificial Intelligence

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.


The growth and form of knowledge networks by kinesthetic curiosity

arXiv.org Artificial Intelligence

Throughout life, we might seek a calling, companions, skills, entertainment, truth, self-knowledge, beauty, and edification. The practice of curiosity can be viewed as an extended and open-ended search for valuable information with hidden identity and location in a complex space of interconnected information. Despite its importance, curiosity has been challenging to computationally model because the practice of curiosity often flourishes without specific goals, external reward, or immediate feedback. Here, we show how network science, statistical physics, and philosophy can be integrated into an approach that coheres with and expands the psychological taxonomies of specific-diversive and perceptual-epistemic curiosity. Using this interdisciplinary approach, we distill functional modes of curious information seeking as searching movements in information space. The kinesthetic model of curiosity offers a vibrant counterpart to the deliberative predictions of model-based reinforcement learning. In doing so, this model unearths new computational opportunities for identifying what makes curiosity curious.


Relational Learning Analysis of Social Politics using Knowledge Graph Embedding

arXiv.org Artificial Intelligence

Knowledge Graphs (KGs) have gained considerable attention recently from both academia and industry. In fact, incorporating graph technology and the copious of various graph datasets have led the research community to build sophisticated graph analytics tools. Therefore, the application of KGs has extended to tackle a plethora of real-life problems in dissimilar domains. Despite the abundance of the currently proliferated generic KGs, there is a vital need to construct domain-specific KGs. Further, quality and credibility should be assimilated in the process of constructing and augmenting KGs, particularly those propagated from mixed-quality resources such as social media data. This paper presents a novel credibility domain-based KG Embedding framework. This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology. The proposed approach makes use of various knowledge-based repositories to enrich the semantics of the textual contents, thereby facilitating the interoperability of information. The proposed framework also embodies a credibility module to ensure data quality and trustworthiness. The constructed KG is then embedded in a low-dimension semantically-continuous space using several embedding techniques. The utility of the constructed KG and its embeddings is demonstrated and substantiated on link prediction, clustering, and visualisation tasks.


NEMA: Automatic Integration of Large Network Management Databases

arXiv.org Artificial Intelligence

Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses on the schema-level matching based on schema properties such as field names, data types, constraints and schema structures. Network management databases contain massive tables (e.g., network products, incidents, security alert and logs) from different departments and groups with nonuniform field names and schema characteristics. It is not reliable to match them by those schema properties. The other category is based on the instance-level matching using general string similarity techniques, which are not applicable for the matching of large network management databases. In this paper, we develop a matching technique for large NEtwork MAnagement databases (NEMA) deploying instance-level matching for effective data integration and connection. We design matching metrics and scores for both numerical and non-numerical fields and propose algorithms for matching these fields. The effectiveness and efficiency of NEMA are evaluated by conducting experiments based on ground truth field pairs in large network management databases. Our measurement on large databases with 1,458 fields, each of which contains over 10 million records, reveals that the accuracies of NEMA are up to 95%. It achieves 2%-10% higher accuracy and 5x-14x speedup over baseline methods.


A Novel Approach for Generating SPARQL Queries from RDF Graphs

arXiv.org Artificial Intelligence

This work is done as part of a research master's thesis project. The goal is to generate SPARQL queries based on user-supplied keywords to query RDF graphs. To do this, we first transformed the input ontology into an RDF graph that reflects the semantics represented in the ontology. Subsequently, we stored this RDF graph in the Neo4j graphical database to ensure efficient and persistent management of RDF data. At the time of the interrogation, we studied the different possible and desired interpretations of the request originally made by the user. We have also proposed to carry out a sort of transformation between the two query languages SPARQL and Cypher, which is specific to Neo4j. This allows us to implement the architecture of our system over a wide variety of BD-RDFs providing their query languages, without changing any of the other components of the system. Finally, we tested and evaluated our tool using different test bases, and it turned out that our tool is comprehensive, effective, and powerful enough.


KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper, we present KGTK, a data science-centric toolkit to represent, create, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applications, enabling a wide audience of developers to easily construct knowledge graph pipelines for their applications. We illustrate KGTK with real-world scenarios in which we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet, in our own work.


Performance Optimization of a Fuzzy Entropy based Feature Selection and Classification Framework

arXiv.org Machine Learning

In this paper, based on a fuzzy entropy feature selection framework, different methods have been implemented and compared to improve the key components of the framework. Those methods include the combinations of three ideal vector calculations, three maximal similarity classifiers and three fuzzy entropy functions. Different feature removal orders based on the fuzzy entropy values were also compared. The proposed method was evaluated on three publicly available biomedical datasets. From the experiments, we concluded the optimized combination of the ideal vector, similarity classifier and fuzzy entropy function for feature selection. The optimized framework was also compared with other six classical filter-based feature selection methods. The proposed method was ranked as one of the top performers together with the Correlation and ReliefF methods. More importantly, the proposed method achieved the most stable performance for all three datasets when the features being gradually removed. This indicates a better feature ranking performance than the other compared methods.


On Dealing with Conflicting, Uncertain and Partially Ordered Ontologies

AAAI Conferences

We focus on handling conflicting and uncertain information in lightweight ontologies, where uncertainty is represented in a possibilistic logic setting. We use DL-Lite, a tractable fragment of Description Logic, to specify terminological knowledge (i.e., TBox). We assume the TBox to be stable and coherent, while its combination with a set of assertional facts (i.e., ABox) may be inconsistent. We address the problem of dealing with conflicts when the reliability relation between sources is only partially ordered. We propose to represent the uncertain ABox as a symbolic weighted base, where a strict partial preorder is applied on the weights. In this context, we provide a strategy for computing a single repair for the ABox, called the partial possibilistic repair. The idea is to consider all compatible bases of a partially preordered ABox (which intuitively encode total extensions of the partial preorder), compute their associated possibilistic repairs, before intersecting those repairs. We define the notion of π-accepted assertions and provide an equivalent characterization, therefore ensuring tractable computations of our method.