Collaborating Authors


PNEL: Pointer Network based End-To-End Entity Linking over Knowledge Graphs Artificial Intelligence

Question Answering systems are generally modelled as a pipeline consisting of a sequence of steps. In such a pipeline, Entity Linking (EL) is often the first step. Several EL models first perform span detection and then entity disambiguation. In such models errors from the span detection phase cascade to later steps and result in a drop of overall accuracy. Moreover, lack of gold entity spans in training data is a limiting factor for span detector training. Hence the movement towards end-to-end EL models began where no separate span detection step is involved. In this work we present a novel approach to end-to-end EL by applying the popular Pointer Network model, which achieves competitive performance. We demonstrate this in our evaluation over three datasets on the Wikidata Knowledge Graph.

VisualSem: a high-quality knowledge graph for vision and language Artificial Intelligence

We argue that the next frontier in natural language understanding (NLU) and generation (NLG) will include models that can efficiently access external structured knowledge repositories. In order to support the development of such models, we release the VisualSem knowledge graph (KG) which includes nodes with multilingual glosses and multiple illustrative images and visually relevant relations. We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG. This multi-modal retrieval model can be integrated into any (neural network) model pipeline and we encourage the research community to use VisualSem for data augmentation and/or as a source of grounding, among other possible uses. VisualSem as well as the multi-modal retrieval model are publicly available and can be downloaded in:

Efficient Knowledge Graph Validation via Cross-Graph Representation Learning Artificial Intelligence

Recent advances in information extraction have motivated the automatic construction of huge Knowledge Graphs (KGs) by mining from large-scale text corpus. However, noisy facts are unavoidably introduced into KGs that could be caused by automatic extraction. To validate the correctness of facts (i.e., triplets) inside a KG, one possible approach is to map the triplets into vector representations by capturing the semantic meanings of facts. Although many representation learning approaches have been developed for knowledge graphs, these methods are not effective for validation. They usually assume that facts are correct, and thus may overfit noisy facts and fail to detect such facts. Towards effective KG validation, we propose to leverage an external human-curated KG as auxiliary information source to help detect the errors in a target KG. The external KG is built upon human-curated knowledge repositories and tends to have high precision. On the other hand, although the target KG built by information extraction from texts has low precision, it can cover new or domain-specific facts that are not in any human-curated repositories. To tackle this challenging task, we propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently. This is achieved by embedding triplets based on their semantic meanings, drawing cross-KG negative samples and estimating a confidence score for each triplet based on its degree of correctness. We evaluate the proposed framework on datasets across different domains. Experimental results show that the proposed framework achieves the best performance compared with the state-of-the-art methods on large-scale KGs.

Word meaning in minds and machines Artificial Intelligence

Machines show an increasingly broad set of linguistic competencies, thanks to recent progress in Natural Language Processing (NLP). Many algorithms stem from past computational work in psychology, raising the question of whether they understand words as people do. In this paper, we compare how humans and machines represent the meaning of words. We argue that contemporary NLP systems are promising models of human word similarity, but they fall short in many other respects. Current models are too strongly linked to the text-based patterns in large corpora, and too weakly linked to the desires, goals, and beliefs that people use words in order to express. Word meanings must also be grounded in vision and action, and capable of flexible combinations, in ways that current systems are not. We pose concrete challenges for developing machines with a more human-like, conceptual basis for word meaning. We also discuss implications for cognitive science and NLP.

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation Artificial Intelligence

To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, \textbf{COVID-KG} to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures and knowledge subgraphs as evidence. All of the data, KGs, reports, resources and shared services are publicly available.

Coronavirus Knowledge Graph: A Case Study Artificial Intelligence

The emergence of the novel COVID-19 pandemic has had a significant impact on global healthcare and the economy over the past few months. The virus's rapid widespread has led to a proliferation in biomedical research addressing the pandemic and its related topics. One of the essential Knowledge Discovery tools that could help the biomedical research community understand and eventually find a cure for COVID-19 are Knowledge Graphs. The CORD-19 dataset is a collection of publicly available full-text research articles that have been recently published on COVID-19 and coronavirus topics. Here, we use several Machine Learning, Deep Learning, and Knowledge Graph construction and mining techniques to formalize and extract insights from the PubMed dataset and the CORD-19 dataset to identify COVID-19 related experts and bio-entities. Besides, we suggest possible techniques to predict related diseases, drug candidates, gene, gene mutations, and related compounds as part of a systematic effort to apply Knowledge Discovery methods to help biomedical researchers tackle the pandemic.

A frame semantics based approach to comparative study of digitized corpus Artificial Intelligence

in this paper, we present a corpus linguistics based approach applied to analyzing digitized classical multilingual novels and narrative texts, from a semantic point of view. Digitized novels such as "the hobbit (Tolkien J. R. R., 1937)" and "the hound of the Baskervilles (Doyle A. C. 1901-1902)", which were widely translated to dozens of languages, provide rich materials for analyzing languages differences from several perspectives and within a number of disciplines like linguistics, philosophy and cognitive science. Taking motion events conceptualization as a case study, this paper, focus on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels, in order to re-examine the linguistic encodings of motion events in English and Arabic in terms of Frame Semantics. The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.

Cross-lingual Entity Alignment for Knowledge Graphs with Incidental Supervision from Free Text Artificial Intelligence

Much research effort has been put to multilingual knowledge graph (KG) embedding methods to address the entity alignment task, which seeks to match entities in different languagespecific KGs that refer to the same real-world object. Such methods are often hindered by the insufficiency of seed alignment provided between KGs. Therefore, we propose a new model, JEANS , which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text. JEANS first deploys an entity grounding process to combine each KG with the monolingual text corpus. Then, two learning processes are conducted: (i) an embedding learning process to encode the KG and text of each language in one embedding space, and (ii) a self-learning based alignment learning process to iteratively induce the correspondence of entities and that of lexemes between embeddings. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision, and significantly outperforms state-of-the-art methods that solely rely on internal information of KGs.

KGvec2go -- Knowledge Graph Embeddings as a Service Artificial Intelligence

Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple semantic benchmarks. The evaluation also reveals that the combination of multiple models can lead to a better outcome than the best individual model.

Knowledge graph based methods for record linkage Artificial Intelligence

Nowadays, it is common in Historical Demography the use of individual-level data as a consequence of a predominant life-course approach for the understanding of the demographic behaviour, family transition, mobility, etc. Record linkage advance is key in these disciplines since it allows to increase the volume and the data complexity to be analyzed. However, current methods are constrained to link data coming from the same kind of sources. Knowledge graph are flexible semantic representations, which allow to encode data variability and semantic relations in a structured manner. In this paper we propose the knowledge graph use to tackle record linkage task. The proposed method, named {\bf WERL}, takes advantage of the main knowledge graph properties and learns embedding vectors to encode census information. These embeddings are properly weighted to maximize the record linkage performance. We have evaluated this method on benchmark data sets and we have compared it to related methods with stimulating and satisfactory results.