Bodenreider, Olivier
Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion
Gutierrez, Bernal Jimenez, Mao, Yuqing, Nguyen, Vinh, Fung, Kin Wah, Su, Yu, Bodenreider, Olivier
As the immense opportunities enabled by large language models become more apparent, NLP systems will be increasingly expected to excel in real-world settings. However, in many instances, powerful models alone will not yield translational NLP solutions, especially if the formulated problem is not well aligned with the real-world task. In this work, we study the case of UMLS vocabulary insertion, an important real-world task in which hundreds of thousands of new terms, referred to as atoms, are added to the UMLS, one of the most comprehensive open-source biomedical knowledge bases. Previous work aimed to develop an automated NLP system to make this time-consuming, costly, and error-prone task more efficient. Nevertheless, practical progress in this direction has been difficult to achieve due to a problem formulation and evaluation gap between research output and the real-world task. In order to address this gap, we introduce a new formulation for UMLS vocabulary insertion which mirrors the real-world task, datasets which faithfully represent it and several strong baselines we developed through re-purposing existing solutions. Additionally, we propose an effective rule-enhanced biomedical language model which enables important new model behavior, outperforms all strong baselines and provides measurable qualitative improvements to editors who carry out the UVI task. We hope this case study provides insight into the considerable importance of problem formulation for the success of translational NLP solutions.
On Reasoning with RDF Statements about Statements using Singleton Property Triples
Nguyen, Vinh, Bodenreider, Olivier, Thirunarayan, Krishnaprasad, Fu, Gang, Bolton, Evan, Rosinach, Núria Queralt, Furlong, Laura I., Dumontier, Michel, Sheth, Amit
The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Semantic Web practitioners. Can an existing reasoner recognize the singleton property triples? And how? If the singleton property triples describe a data triple, then how can a reasoner infer this data triple from the singleton property triples? Or would the large property hierarchy affect the reasoners in some way? We address these questions in this paper and present our study about the reasoning aspects of the singleton properties. We propose a simple mechanism to enable existing reasoners to recognize the singleton property triples, as well as to infer the data triples described by the singleton property triples. We evaluate the effect of the singleton property triples in the reasoning processes by comparing the performance on RDF datasets with and without singleton properties. Our evaluation uses as benchmark the LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal information added through singleton properties.
Finding Semantic Inconsistencies in UMLS using Answer Set Programming
Erdogan, Halit (Sabanci University) | Bodenreider, Olivier (National Library of Medicine) | Erdem, Esra (Sabanci University)
The UMLS Metathesaurus was assembled by integrating its ancestors. We introduced an inconsistency definition for some 150 source vocabularies; it contains more than Metathesaurus concepts based on their hierarchical relations 2 million concepts (i.e., clusters of synonymous terms coming and compute all such inconsistent concepts. After that we from multiple source vocabularies identified by a Concept manually review some of the inconsistent concepts to determine Unique Identifier). The UMLS Metathesaurus contains the ones that have erroneous synonymy relations such also more than 36 million relations between these concepts, as wrong synonymy.