Goto

Collaborating Authors

 Ontologies


Adverse Childhood Experiences Identification from Clinical Notes with Ontologies and NLP

arXiv.org Artificial Intelligence

Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from free-text Electronic Health Records (EHRs) with Natural Language Processing (NLP) is challenging because (a) there is no NLP ready ACE ontologies; (b) there are limited cases available for machine learning, necessitating the data annotation from clinical experts. We are currently developing a tool that would use NLP techniques to assist us in surfacing ACEs from clinical notes. This will enable us further research in identifying evidence of the relationship between ACEs and the subsequent developments of mental illness (e.g., addictions) in large-scale and longitudinal free-text EHRs, which has previously not been possible.


Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

arXiv.org Artificial Intelligence

Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots, and values. As we move towards adaptable dialogue systems where knowledge about domains, slots, and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related non-dialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Then, we apply this model to a different corpus, the Schema-Guided Dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.


Knowledge Graph Curation: A Practical Framework

arXiv.org Artificial Intelligence

Knowledge Graphs (KGs) have shown to be very important for applications such as personal assistants, question-answering systems, and search engines. Therefore, it is crucial to ensure their high quality. However, KGs inevitably contain errors, duplicates, and missing values, which may hinder their adoption and utility in business applications, as they are not curated, e.g., low-quality KGs produce low-quality applications that are built on top of them. In this vision paper, we propose a practical knowledge graph curation framework for improving the quality of KGs. First, we define a set of quality metrics for assessing the status of KGs, Second, we describe the verification and validation of KGs as cleaning tasks, Third, we present duplicate detection and knowledge fusion strategies for enriching KGs. Furthermore, we give insights and directions toward a better architecture for curating KGs.


TAR: Neural Logical Reasoning across TBox and ABox

arXiv.org Artificial Intelligence

Many ontologies, i.e., Description Logic (DL) knowledge bases, have been developed to provide rich knowledge about various domains. An ontology consists of an ABox, i.e., assertion axioms between two entities or between a concept and an entity, and a TBox, i.e., terminology axioms between two concepts. Neural logical reasoning (NLR) is a fundamental task to explore such knowledge bases, which aims at answering multi-hop queries with logical operations based on distributed representations of queries and answers. While previous NLR methods can give specific entity-level answers, i.e., ABox answers, they are not able to provide descriptive concept-level answers, i.e., TBox answers, where each concept is a description of a set of entities. In other words, previous NLR methods only reason over the ABox of an ontology while ignoring the TBox. In particular, providing TBox answers enables inferring the explanations of each query with descriptive concepts, which make answers comprehensible to users and are of great usefulness in the field of applied ontology. In this work, we formulate the problem of neural logical reasoning across TBox and ABox (TA-NLR), solving which needs to address challenges in incorporating, representing, and operating on concepts. We propose an original solution named TAR for TA-NLR. Firstly, we incorporate description logic based ontological axioms to provide the source of concepts. Then, we represent concepts and queries as fuzzy sets, i.e., sets whose elements have degrees of membership, to bridge concepts and queries with entities. Moreover, we design operators involving concepts on top of fuzzy set representation of concepts and queries for optimization and inference. Extensive experimental results on two real-world datasets demonstrate the effectiveness of TAR for TA-NLR.


Manual-Guided Dialogue for Flexible Conversational Agents

arXiv.org Artificial Intelligence

How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be two critical issues in building a task-oriented dialogue system. In this paper, we propose a novel manual-guided dialogue scheme to alleviate these problems, where the agent learns the tasks from both dialogue and manuals. The manual is an unstructured textual document that guides the agent in interacting with users and the database during the conversation. Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains. We then contribute a fully-annotated multi-domain dataset MagDial to support our scheme. It introduces three dialogue modeling subtasks: instruction matching, argument filling, and response generation. Modeling these subtasks is consistent with the human agent's behavior patterns. Experiments demonstrate that the manual-guided dialogue scheme improves data efficiency and domain scalability in building dialogue systems. The dataset and benchmark will be publicly available for promoting future research.


Function Classes for Identifiable Nonlinear Independent Component Analysis

arXiv.org Artificial Intelligence

Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms mapping them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constraints on the model class. This is notably the case for nonlinear Independent Component Analysis, in which the LVM maps statistically independent variables to observations via a deterministic nonlinear function. Several families of spurious solutions fitting perfectly the data, but that do not correspond to the ground truth factors can be constructed in generic settings. However, recent work suggests that constraining the function class of such models may promote identifiability. Specifically, function classes with constraints on their partial derivatives, gathered in the Jacobian matrix, have been proposed, such as orthogonal coordinate transformations (OCT), which impose orthogonality of the Jacobian columns. In the present work, we prove that a subclass of these transformations, conformal maps, is identifiable and provide novel theoretical results suggesting that OCTs have properties that prevent families of spurious solutions to spoil identifiability in a generic setting.


Autonomous Intelligent Software Development

arXiv.org Artificial Intelligence

We present an overview of the design and first proof-of-concept implementation for AIDA, an autonomous intelligent developer agent that develops software from scratch. AIDA takes a software requirements specification and uses reasoning over a semantic knowledge graph to interpret the requirements, then designs and writes software to satisfy them. AIDA uses both declarative and procedural knowledge in the core domains of data, algorithms, and code, plus some general knowledge. The reasoning codebase uses this knowledge to identify needed components, then designs and builds the necessary information structures around them that become the software. These structures, the motivating requirements, and the resulting source code itself are all new knowledge that are added to the knowledge graph, becoming available for future reasoning. In this way, AIDA also learns as she writes code and becomes more efficient when writing subsequent code.


Designing and Building Enterprise Knowledge Graphs (Synthesis Lectures on Data, Semantics, and Knowledge, 20): 9781636391748: Computer Science Books @ Amazon.com

#artificialintelligence

Ora Lassila is a Principal Graph Technologist in the Amazon Neptune graph database team. Earlier, he was a Managing Director at State Street, heading their efforts to adopt ontologies and graph databases. Before that, he worked as a technology architect at Pegasystems, as an architect and technology strategist at Nokia Location & Commerce (aka HERE), and prior to that he was a Research Fellow at the Nokia Research Center Cambridge. He was an elected member of the Advisory Board of the World Wide Web Consortium (W3C) in 1998-2013, and represented Nokia in the W3C Advisory Committee in 1998-2002. In 1996-1997 he was a Visiting Scientist at MIT Laboratory for Computer Science, working with W3C and launching the Resource Description Framework (RDF) standard; he served as a co-editor of the RDF Model and Syntax specification.


A Novel Ontology-guided Attribute Partitioning Ensemble Learning Model for Early Prediction of Cognitive Deficits using Quantitative Structural MRI in Very Preterm Infants

arXiv.org Artificial Intelligence

Structural magnetic resonance imaging studies have shown that brain anatomical abnormalities are associated with cognitive deficits in preterm infants. Brain maturation and geometric features can be used with machine learning models for predicting later neurodevelopmental deficits. However, traditional machine learning models would suffer from a large feature-to-instance ratio (i.e., a large number of features but a small number of instances/samples). Ensemble learning is a paradigm that strategically generates and integrates a library of machine learning classifiers and has been successfully used on a wide variety of predictive modeling problems to boost model performance. Attribute (i.e., feature) bagging method is the most commonly used feature partitioning scheme, which randomly and repeatedly draws feature subsets from the entire feature set. Although attribute bagging method can effectively reduce feature dimensionality to handle the large feature-to-instance ratio, it lacks consideration of domain knowledge and latent relationship among features. In this study, we proposed a novel Ontology-guided Attribute Partitioning (OAP) method to better draw feature subsets by considering the domain-specific relationship among features. With the better partitioned feature subsets, we developed an ensemble learning framework, which is referred to as OAP-Ensemble Learning (OAP-EL). We applied the OAP-EL to predict cognitive deficits at 2 years of age using quantitative brain maturation and geometric features obtained at term equivalent age in very preterm infants. We demonstrated that the proposed OAP-EL approach significantly outperformed the peer ensemble learning and traditional machine learning approaches.


Enriching Wikidata with Linked Open Data

arXiv.org Artificial Intelligence

Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users' needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with a narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.