Ontologies
Ontological component-based description of robot capabilities
Dussard, Bastien, Sarthou, Guillaume, Clodic, Aurélie
A key aspect of a robot's knowledge base is self-awareness about what it is capable of doing. It allows to define which tasks it can be assigned to and which it cannot. We will refer to this knowledge as the Capability concept. As capabilities stems from the components the robot owns, they can be linked together. In this work, we hypothesize that this concept can be inferred from the components rather than merely linked to them. Therefore, we introduce an ontological means of inferring the agent's capabilities based on the components it owns as well as low-level capabilities. This inference allows the agent to acknowledge what it is able to do in a responsive way and it is generalizable to external entities the agent can carry for example. To initiate an action, the robot needs to link its capabilities with external entities. To do so, it needs to infer affordance relations from its capabilities as well as the external entity's dispositions. This work is part of a broader effort to integrate social affordances into a Human-Robot collaboration context and is an extension of an already existing ontology.
A Practical Entity Linking System for Tables in Scientific Literature
Mulwad, Varish, Finin, Tim, Kumar, Vijay S., Williams, Jenny Weisenberg, Dixit, Sharad, Joshi, Anupam
Entity linking is an important step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents, including the retrieval of relevant information included in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domain-specific entities, especially for those entities embedded within tables drawn from COVID-19-related scientific literature. We describe the setup of an efficient offline instance of the system that enables our entity-linking approach to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage the structural and semantic characteristics of the tables to improve overall entity linking performance.
Query Rewriting with Disjunctive Existential Rules and Mappings
Leclère, Michel, Mugnier, Marie-Laure, Pérution-Kihli, Guillaume
We consider the issue of answering unions of conjunctive queries (UCQs) with disjunctive existential rules and mappings. While this issue has already been well studied from a chase perspective, query rewriting within UCQs has hardly been addressed yet. We first propose a sound and complete query rewriting operator, which has the advantage of establishing a tight relationship between a chase step and a rewriting step. The associated breadth-first query rewriting algorithm outputs a minimal UCQ-rewriting when one exists. Second, we show that for any ``truly disjunctive'' nonrecursive rule, there exists a conjunctive query that has no UCQ-rewriting. It follows that the notion of finite unification sets (fus), which denotes sets of existential rules such that any UCQ admits a UCQ-rewriting, seems to have little relevance in this setting. Finally, turning our attention to mappings, we show that the problem of determining whether a UCQ admits a UCQ-rewriting through a disjunctive mapping is undecidable. We conclude with a number of open problems.
RadLing: Towards Efficient Radiology Report Understanding
Ghosh, Rikhiya, Karn, Sanjeev Kumar, Danu, Manuela Daniela, Micu, Larisa, Vunikili, Ramya, Farri, Oladimeji
Most natural language tasks in the radiology domain use language models pre-trained on biomedical corpus. There are few pretrained language models trained specifically for radiology, and fewer still that have been trained in a low data setting and gone on to produce comparable results in fine-tuning tasks. We present RadLing, a continuously pretrained language model using Electra-small (Clark et al., 2020) architecture, trained using over 500K radiology reports, that can compete with state-of-the-art results for fine tuning tasks in radiology domain. Our main contribution in this paper is knowledge-aware masking which is a taxonomic knowledge-assisted pretraining task that dynamically masks tokens to inject knowledge during pretraining. In addition, we also introduce an knowledge base-aided vocabulary extension to adapt the general tokenization vocabulary to radiology domain.
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles
Parekh, Tanmay, Hsu, I-Hung, Huang, Kuan-Hao, Chang, Kai-Wei, Peng, Nanyun
Recent works in Event Argument Extraction (EAE) have focused on improving model generalizability to cater to new events and domains. However, standard benchmarking datasets like ACE and ERE cover less than 40 event types and 25 entity-centric argument roles. Limited diversity and coverage hinder these datasets from adequately evaluating the generalizability of EAE models. In this paper, we first contribute by creating a large and diverse EAE ontology. This ontology is created by transforming FrameNet, a comprehensive semantic role labeling (SRL) dataset for EAE, by exploiting the similarity between these two tasks. Then, exhaustive human expert annotations are collected to build the ontology, concluding with 115 events and 220 argument roles, with a significant portion of roles not being entities. We utilize this ontology to further introduce GENEVA, a diverse generalizability benchmarking dataset comprising four test suites, aimed at evaluating models' ability to handle limited data and unseen event type generalization. We benchmark six EAE models from various families. The results show that owing to non-entity argument roles, even the best-performing model can only achieve 39% F1 score, indicating how GENEVA provides new challenges for generalization in EAE. Overall, our large and diverse EAE ontology can aid in creating more comprehensive future resources, while GENEVA is a challenging benchmarking dataset encouraging further research for improving generalizability in EAE. The code and data can be found at https://github.com/PlusLabNLP/GENEVA.
Developing and Building Ontologies in Cyber Security
Farooq, Muhammad Shoaib, Waseem, Muhammad Talha
Cyber Security is one of the most arising disciplines in our modern society. We work on Cybersecurity domain and in this the topic we chose is Cyber Security Ontologies. In this we gather all latest and previous ontologies and compare them on the basis of different analyzing factors to get best of them. Reason to select this topic is to assemble different ontologies from different era of time. Because, researches that included in this SLR is mostly studied single ontology. If any researcher wants to study ontologies, he has to study every single ontology and select which one is best for his research. So, we assemble different types of ontology and compare them against each other to get best of them. A total 24 papers between years 2010-2020 are carefully selected through systematic process and classified accordingly. Lastly, this SLR have been presented to provide the researchers promising future directions in the domain of cybersecurity ontologies.
Knowledge Graph Embedding with Electronic Health Records Data via Latent Graphical Block Model
Lu, Junwei, Yin, Jin, Cai, Tianxi
Due to the increasing adoption of electronic health records (EHR), large scale EHRs have become another rich data source for translational clinical research. Despite its potential, deriving generalizable knowledge from EHR data remains challenging. First, EHR data are generated as part of clinical care with data elements too detailed and fragmented for research. Despite recent progress in mapping EHR data to common ontology with hierarchical structures, much development is still needed to enable automatic grouping of local EHR codes to meaningful clinical concepts at a large scale. Second, the total number of unique EHR features is large, imposing methodological challenges to derive reproducible knowledge graph, especially when interest lies in conditional dependency structure. Third, the detailed EHR data on a very large patient cohort imposes additional computational challenge to deriving a knowledge network. To overcome these challenges, we propose to infer the conditional dependency structure among EHR features via a latent graphical block model (LGBM). The LGBM has a two layer structure with the first providing semantic embedding vector (SEV) representation for the EHR features and the second overlaying a graphical block model on the latent SEVs. The block structures on the graphical model also allows us to cluster synonymous features in EHR. We propose to learn the LGBM efficiently, in both statistical and computational sense, based on the empirical point mutual information matrix. We establish the statistical rates of the proposed estimators and show the perfect recovery of the block structure. Numerical results from simulation studies and real EHR data analyses suggest that the proposed LGBM estimator performs well in finite sample.
A Tale of Two Laws of Semantic Change: Predicting Synonym Changes with Distributional Semantic Models
Liétard, Bastien, Keller, Mikaela, Denis, Pascal
Lexical Semantic Change is the study of how the meaning of words evolves through time. Another related question is whether and how lexical relations over pairs of words, such as synonymy, change over time. There are currently two competing, apparently opposite hypotheses in the historical linguistic literature regarding how synonymous words evolve: the Law of Differentiation (LD) argues that synonyms tend to take on different meanings over time, whereas the Law of Parallel Change (LPC) claims that synonyms tend to undergo the same semantic change and therefore remain synonyms. So far, there has been little research using distributional models to assess to what extent these laws apply on historical corpora. In this work, we take a first step toward detecting whether LD or LPC operates for given word pairs. After recasting the problem into a more tractable task, we combine two linguistic resources to propose the first complete evaluation framework on this problem and provide empirical evidence in favor of a dominance of LD. We then propose various computational approaches to the problem using Distributional Semantic Models and grounded in recent literature on Lexical Semantic Change detection. Our best approaches achieve a balanced accuracy above 0.6 on our dataset. We discuss challenges still faced by these approaches, such as polysemy or the potential confusion between synonymy and hypernymy.
Non-Sequential Graph Script Induction via Multimedia Grounding
Zhou, Yu, Li, Sha, Li, Manling, Lin, Xudong, Chang, Shih-Fu, Bansal, Mohit, Ji, Heng
Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed in the reverse order, suggesting their ordering is not fixed. In addition, each step has an average of 2.56 frequent next steps, demonstrating "branching". In this paper, we propose the new challenging task of non-sequential graph script induction, aiming to capture optional and interchangeable steps in procedural planning. To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks. In particular, we design a multimodal framework to ground procedural videos to WikiHow textual steps and thus transform each video into an observed step path on the latent ground truth graph script. This key transformation enables us to train a script knowledge model capable of both generating explicit graph scripts for learnt tasks and predicting future steps given a partial step sequence. Our best model outperforms the strongest pure text/vision baselines by 17.52% absolute gains on F1@3 for next step prediction and 13.8% absolute gains on Acc@1 for partial sequence completion. Human evaluation shows our model outperforming the WikiHow linear baseline by 48.76% absolute gains in capturing sequential and non-sequential step relationships.
A Knowledge Engineering Primer
Knowledge can take different forms. We distinguish between declarative knowledge (knowing something) or procedural knowledge (knowing how, know-how), sensorimotor knowledge (riding a bicycle), and affective knowledge (deep understanding). The classic definition of knowledge derived from philosophy defines knowledge as a justified true belief. It can be said to occur in situations where we consider something to be objectively "true" or "stated". Another definition refers to what is "explicit knowledge" that is something that is known and can be written down [30].