Goto

Collaborating Authors

 Ontologies


How Elsevier Accelerated COVID-19 research using Dask on Saturn Cloud -- Elsevier Labs

#artificialintelligence

The version of CORD-19 that we used yielded 3,389,064 paragraphs and 16,952,279 sentences. Each sentence is sent to each model and yields zero or more entities. A notable point is that the process of generating entities from sentences is embarrassingly parallel, and therefore processing multiple sentences in parallel achieves savings in processing time. . To process the dataset, we used Dask, an open source library for parallel computing in Python. Dask provides multiple convenient abstractions that mimic familiar APIs such as Numpy and Pandas Dataframes, which can operate on datasets that do not fit in main memory.


Multifaceted Context Representation using Dual Attention for Ontology Alignment

arXiv.org Artificial Intelligence

Ontology Alignment is an important research problem that finds application in various fields such as data integration, data transfer, data preparation etc. State-of-the-art (SOTA) architectures in Ontology Alignment typically use naive domain-dependent approaches with handcrafted rules and manually assigned values, making them unscalable and inefficient. Deep Learning approaches for ontology alignment use domain-specific architectures that are not only in-extensible to other datasets and domains, but also typically perform worse than rule-based approaches due to various limitations including over-fitting of models, sparsity of datasets etc. In this work, we propose VeeAlign, a Deep Learning based model that uses a dual-attention mechanism to compute the contextualized representation of a concept in order to learn alignments. By doing so, not only does our approach exploit both syntactic and semantic structure of ontologies, it is also, by design, flexible and scalable to different domains with minimal effort. We validate our approach on various datasets from different domains and in multilingual settings, and show its superior performance over SOTA methods.


From Conjunctive Queries to Instance Queries in Ontology-Mediated Querying

arXiv.org Artificial Intelligence

We consider ontology-mediated queries (OMQs) based on expressive description logics of the ALC family and (unions) of conjunctive queries, studying the rewritability into OMQs based on instance queries (IQs). Our results include exact characterizations of when such a rewriting is possible and tight complexity bounds for deciding rewritability. We also give a tight complexity bound for the related problem of deciding whether a given MMSNP sentence is equivalent to a CSP.


Self-alignment Pre-training for Biomedical Entity Representations

arXiv.org Artificial Intelligence

Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. Addressing this is of paramount importance for tasks such as entity linking where complex relational knowledge is pivotal. We propose SapBERT, a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts. Our experimental results on six medical entity linking benchmarking datasets demonstrate that SapBERT outperforms many domain-specific BERT-based variants such as BioBERT, BlueBERT and PubMedBERT, achieving the state-of-the-art (SOTA) performances.


Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition

arXiv.org Artificial Intelligence

A booming amount of information is continuously added to the Internet as structured and unstructured data, feeding knowledge bases such as DBpedia and Wikidata with billions of statements describing millions of entities. The aim of Question Answering systems is to allow lay users to access such data using natural language without needing to write formal queries. However, users often submit questions that are complex and require a certain level of abstraction and reasoning to decompose them into basic graph patterns. In this short paper, we explore the use of architectures based on Neural Machine Translation called Neural SPARQL Machines to learn pattern compositions. We show that sequence-to-sequence models are a viable and promising option to transform long utterances into complex SPARQL queries.


Pinaki Laskar posted on LinkedIn

#artificialintelligence

AI collects, processes and generates coded "representations", #data, numbers, symbols or signals, as programmable #cognitive functions, sensation, perception, learning, reasoning, self-knowing, understanding, decision making and acting. Coding is inputting a collection of data as instructions into a programming language to convert into low level instructions that machines understand, as #machine language/code. Algorithms as automated data instructions can be simple or complex, depending on how many layers deep the initial algorithm goes. It is rather data models, schemata or global ontology with the mental models of reality, world views, common sense knowledge, science, domain ontologies, theories of mind, etc. what makes all the difference. Artificial Intelligence is an informational entity, a composite of information and information processing by advanced software that can self-learn and self-improve and self-code.


UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

arXiv.org Artificial Intelligence

Contextual word embedding models, such as BioBERT and Bio_ClinicalBERT, have achieved state-of-the-art results in biomedical natural language processing tasks by focusing their pre-training process on domain-specific corpora. However, such models do not take into consideration expert domain knowledge. In this work, we introduced UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process via a novel knowledge augmentation strategy. More specifically, the augmentation on UmlsBERT with the Unified Medical Language System (UMLS) Metathesaurus was performed in two ways: i) connecting words that have the same underlying `concept' in UMLS, and ii) leveraging semantic group knowledge in UMLS to create clinically meaningful input embeddings. By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models on common named-entity recognition (NER) and clinical natural language inference clinical NLP tasks.


Exploiting Knowledge Graphs for Facilitating Product/Service Discovery

arXiv.org Artificial Intelligence

Most of the existing techniques to product discovery rely on syntactic approaches, thus ignoring valuable and specific semantic information of the underlying standards during the process. The product data comes from different heterogeneous sources and formats giving rise to the problem of interoperability. Above all, due to the continuously increasing influx of data, the manual labeling is getting costlier. Integrating the descriptions of different products into a single representation requires organizing all the products across vendors in a single taxonomy. Practically relevant and quality product categorization standards are still limited in number; and that too in academic research projects where we can majorly see only prototypes as compared to industry. This work presents a cost-effective solution for e-commerce on the Data Web by employing an unsupervised approach for data classification and exploiting the knowledge graphs for matching. The proposed architecture describes available products in web ontology language OWL and stores them in a triple store. User input specifications for certain products are matched against the available product categories to generate a knowledge graph. This mullti-phased top-down approach to develop and improve existing, if any, tailored product recommendations will be able to connect users with the exact product/service of their choice.


Memory-Limited Model-Based Diagnosis

arXiv.org Artificial Intelligence

Various model-based diagnosis scenarios require the computation of most preferred fault explanations. Existing algorithms that are sound (i.e., output only actual fault explanations) and complete (i.e., can return all explanations), however, require exponential space to achieve this task. As a remedy, we propose two novel diagnostic search algorithms, called RBF-HS (Recursive Best-First Hitting Set Search) and HBF-HS (Hybrid Best-First Hitting Set Search), which build upon tried and tested techniques from the heuristic search domain. RBF-HS can enumerate an arbitrary predefined finite number of fault explanations in best-first order within linear space bounds, without sacrificing the desirable soundness or completeness properties. The idea of HBF-HS is to find a trade-off between runtime optimization and a restricted space consumption that does not exceed the available memory. In extensive experiments on real-world diagnosis cases we compared our approaches to Reiter's HS-Tree, a state-of-the-art method that gives the same theoretical guarantees and is as general(ly applicable) as the suggested algorithms. For the computation of minimum-cardinality fault explanations, we find that (1) RBF-HS reduces memory requirements substantially in most cases by up to several orders of magnitude, (2) in more than a third of the cases, both memory savings and runtime savings are achieved, and (3) given the runtime overhead is significant, using HBF-HS instead of RBF-HS reduces the runtime to values comparable with HS-Tree while keeping the used memory reasonably bounded. When computing most probable fault explanations, we observe that RBF-HS tends to trade memory savings more or less one-to-one for runtime overheads. Again, HBF-HS proves to be a reasonable remedy to cut down the runtime while complying with practicable memory bounds.


The State of the Art in Ontology Design: A Survey and Comparative Review

AI Magazine

In this article, we develop a framework for comparing ontologies and place a number of the more prominent ontologies into it. We have selected 10 specific projects for this study, including general ontologies, domain-specific ones, and one knowledge representation system. The comparison framework includes general characteristics, such as the purpose of an ontology, its coverage (general or domain specific), its size, and the formalism used. It also includes the design process used in creating an ontology and the methods used to evaluate it. Characteristics that describe the content of an ontology include taxonomic organization, types of concept covered, top-level divisions, internal structure of concepts, representation of part-whole relations, and the presence and nature of additional axioms.