Goto

Collaborating Authors

 Ontologies


Ontology Reuse: the Real Test of Ontological Design

arXiv.org Artificial Intelligence

Reusing ontologies in practice is still very challenging, especially when multiple ontologies are (jointly) involved. Moreover, despite recent advances, the realization of systematic ontology quality assurance remains a difficult problem. In this work, the quality of thirty biomedical ontologies, and the Computer Science Ontology are investigated, from the perspective of a practical use case. Special scrutiny is given to cross-ontology references, which are vital for combining ontologies. Diverse methods to detect potential issues are proposed, including natural language processing and network analysis. Moreover, several suggestions for improving ontologies and their quality assurance processes are presented. It is argued that while the advancing automatic tools for ontology quality assurance are crucial for ontology improvement, they will not solve the problem entirely. It is ontology reuse that is the ultimate method for continuously verifying and improving ontology quality, as well as for guiding its future development. Specifically, multiple issues can be found and fixed primarily through practical and diverse ontology reuse scenarios.


Capability-based Frameworks for Industrial Robot Skills: a Survey

arXiv.org Artificial Intelligence

The research community is puzzled with words like skill, action, atomic unit and others when describing robots' capabilities. However, for giving the possibility to integrate capabilities in industrial scenarios, a standardization of these descriptions is necessary. This work uses a structured review approach to identify commonalities and differences in the research community of robots' skill frameworks. Through this method, 210 papers were analyzed and three main results were obtained. First, the vast majority of authors agree on a taxonomy based on task, skill and primitive. Second, the most investigated robots' capabilities are pick and place. Third, industrial oriented applications focus more on simple robots' capabilities with fixed parameters while ensuring safety aspects. Therefore, this work emphasizes that a taxonomy based on task, skill and primitives should be used by future works to align with existing literature. Moreover, further research is needed in the industrial domain for parametric robots' capabilities while ensuring safety.


A Glossary of Knowledge Graph Terms - DataScienceCentral.com

#artificialintelligence

As with many fields, knowledge graphs boast a wide array of specialized terms. This guide provides a handy reference to these concepts. The Resource Description Framework (or RDF) is a conceptual framework established in the early 2000s by the World Wide Web Consortium for describing sets of interrelated assertions. RDF breaks down such assertions into underlying graph structures in which a subject node is connected to an object node via a predicate edge. The graph then is constructed by connecting the object nodes of one assertion to the subject nodes of another assertion, in a manner analogous to Tinker Toys (or molecular diagrams).


OntoProtein: Protein Pretraining With Gene Ontology Embedding

arXiv.org Artificial Intelligence

Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein.


Pinaki Laskar on LinkedIn: #AI #technology #Data

#artificialintelligence

AI Researcher, Cognitive Technologist Inventor - AI Thinking, Think Chain Innovator - AIOT, XAI, Autonomous Cars, IIOT Founder Fisheyebox Spatial Computing Savant, Transformative Leader, Industry X.0 Practitioner Unicode is an information #technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, there is a total of 143,859 characters, with Unicode 13.0 (these characters consist of 143,696 graphic characters and 163 format characters) covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical. The Universal Coded Character Set (UCS) is a standard set of characters defined by the International Standard ISO/IEC 10646, Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented writing systems are added. To integrate AI into computers and system software means to create a Unicode abstraction level, the Universal Coded Data Set (UCDS), as AI Unidatacode or EIS UCDS.


Adding RDF Lists and Sequences To Sparql - DataScienceCentral.com

#artificialintelligence

This particular article is a discussion about a recommendation to a given standard, that of SPARQL 1.1. None of this has been implemented yet, and as such represents more or less the muiings of a writer, rather than established functionality. Lately, I've been spending some time on the Github archives of the SPARQL 1.2 Community site, a group of people who are looking at the next generation of the SPARQL language. One challenge that has come up frequently has been the lack of good mechanisms in SPARQL for handling ordered lists, something that has proven to be a limiting factor in a lot of ways, especially given that most other languages have had the ability of handling lists and dictionaries for decades. As I was going through the archives, an answer occurred to me that comes down to the fact that RDF and SPARQL, while very closely related, are not in fact the same things.


How to Increase Computational Efficiency for PReLU in CUDA -- OneFlow Performance Optimization

#artificialintelligence

PReLU is an activation function that is frequently used in InsightFace. It has two operating modes: PReLU(1) and PReLU(channels). For the latter, PReLU is equivalent to a binary broadcast operation. In this article, we are going to talk about optimizing the broadcast operations in CUDA. PReLU is an activation function that is frequently used in InsightFace. InsightFace adopts the second mode of PReLU.


RDF Processing in Python with RDFLib - Geeky Humans

#artificialintelligence

An RDF statement expresses a relationship between two resources. The subject and the object represent the two resources being related; the predicate represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called an RDF property. RDF allows us to communicate much more than just words; it allows us to communicate data that can be understood by machines as well as people. In this tutorial, we'll do the RDF Processing in Python with RDFLib.


A Study into patient similarity through representation learning from medical records

arXiv.org Artificial Intelligence

Patient similarity assessment, which identifies patients similar to a given patient, can help improve medical care. The assessment can be performed using Electronic Medical Records (EMRs). Patient similarity measurement requires converting heterogeneous EMRs into comparable formats to calculate their distance. While versatile document representation learning methods have been developed in recent years, it is still unclear how complex EMR data should be processed to create the most useful patient representations. This study presents a new data representation method for EMRs that takes the information in clinical narratives into account. To address the limitations of previous approaches in handling complex parts of EMR data, an unsupervised method is proposed for building a patient representation, which integrates unstructured data with structured data extracted from patients' EMRs. In order to model the extracted data, we employed a tree structure that captures the temporal relations of multiple medical events from EMR. We processed clinical notes to extract symptoms, signs, and diseases using different tools such as medspaCy, MetaMap, and scispaCy and mapped entities to the Unified Medical Language System (UMLS). After creating a tree data structure, we utilized two novel relabeling methods for the non-leaf nodes of the tree to capture two temporal aspects of the extracted events. By traversing the tree, we generated a sequence that could create an embedding vector for each patient. The comprehensive evaluation of the proposed method for patient similarity and mortality prediction tasks demonstrated that our proposed model leads to lower mean squared error (MSE), higher precision, and normalized discounted cumulative gain (NDCG) relative to baselines. Patient similarity analytics, Patient representation learning, Natural language processing, Health informatics 1 Introduction The patient similarity assessment identifies patients similar to a given patient. It allows physicians to gain insights into the records of matching patients and provide better treatments. Calculating patient similarity requires measuring the distance between patients within a population (1). A distance could be calculated based on various structured and unstructured data types in an electronic medical record (EMR). EMRs can be processed in the same way as general documents modeled as sequences of words. The difference is that EMRs are sequences of patient events, such as diagnoses, procedures, and medications. The representation of an EMR is a low-dimension and fixed-length embedding vector, so it can be used as an indicator to measure similarity between patients, simply like a representation of a document that can be applied to measure similarity between notes. Among previous works on patient representations based on EMRs, some have relied on structured data types (2-6), while others have only used unstructured data (7,8).


HyperBox: A Supervised Approach for Hypernym Discovery using Box Embeddings

arXiv.org Artificial Intelligence

Hypernymy plays a fundamental role in many AI tasks like taxonomy learning, ontology learning, etc. This has motivated the development of many automatic identification methods for extracting this relation, most of which rely on word distribution. We present a novel model HyperBox to learn box embeddings for hypernym discovery. Given an input term, HyperBox retrieves its suitable hypernym from a target corpus. For this task, we use the dataset published for SemEval 2018 Shared Task on Hypernym Discovery. We compare the performance of our model on two specific domains of knowledge: medical and music. Experimentally, we show that our model outperforms existing methods on the majority of the evaluation metrics. Moreover, our model generalize well over unseen hypernymy pairs using only a small set of training data.