"An ontology defines the terms used to describe and represent an area of knowledge. … Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them."
– from OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 February 2004). Jeff Heflin, editor.
This observation--that to understand Proust's text requires knowledge of various kinds--is not a new one. We came across it before, in the context of the Cyc project. Remember that Cyc was supposed to be given knowledge corresponding to the whole of consensus reality, and the Cyc hypothesis was that this would yield human-level general intelligence. Researchers in knowledge-based AI would be keen for me to point out to you that, decades ago, they anticipated exactly this issue. But it is not obvious that just continuing to refine deep learning techniques will address this problem.
Such a query can be answered currently at a high human effort cost, by inspecting e.g., a JSON list of Assemblée elected officials (available from NosDeputes.fr) and manually connecting the names with those found in a national registry of companies. This considerable effort may still miss connections that could be found if one added information about politicians' and business people's spouses, information sometimes available in public knowledge bases such as DBPedia, or journalists' notes. No single query language can be used on such heterogeneous data; instead, we study methods to query the corpus by specifying some keywords and asking for all the connections that exist, in one or across several data sources, between these keywords. This problem has been studied under the name of keyword search over structured data, in particular for relational databases [49, 27], XML documents [24, 33], RDF graphs [30, 16]. However, most of these works assumed one single source of data, in which connections among nodes are clearly identified. When authors considered several data sources , they still assumed that one query answer comes from a single data source. In contrast, the ConnectionLens system  answers keyword search queries over arbitrary combinations of datasets and heterogeneous data models, independently produced by actors unaware of each other's existence.
Adverse Drug Reactions (ADRs) are characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown in most cases. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established. We propose to mine knowledge graphs for identifying biomolecular features that may enable reproducing automatically expert classifications that distinguish drug causative or not for a given type of ADR. In an explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, we mine a knowledge graph for features; we train classifiers at distinguishing, drugs associated or not with ADRs; we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and we manually evaluate how they may be explanatory. Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR. Experts fully agreed that 73% and 38% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for 90% and 77% of them. Knowledge graphs provide diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.
Knowledge graphs in manufacturing and production aim to make production lines more efficient and flexible with higher quality output. This makes knowledge graphs attractive for companies to reach Industry 4.0 goals. However, existing research in the field is quite preliminary, and more research effort on analyzing how knowledge graphs can be applied in the field of manufacturing and production is needed. Therefore, we have conducted a systematic literature review as an attempt to characterize the state-of-the-art in this field, i.e., by identifying exiting research and by identifying gaps and opportunities for further research. To do that, we have focused on finding the primary studies in the existing literature, which were classified and analyzed according to four criteria: bibliometric key facts, research type facets, knowledge graph characteristics, and application scenarios. Besides, an evaluation of the primary studies has also been carried out to gain deeper insights in terms of methodology, empirical evidence, and relevance. As a result, we can offer a complete picture of the domain, which includes such interesting aspects as the fact that knowledge fusion is currently the main use case for knowledge graphs, that empirical research and industrial application are still missing to a large extent, that graph embeddings are not fully exploited, and that technical literature is fast-growing but seems to be still far from its peak.
Ontology, as a discipline of philosophy, explains the nature of existence and has its roots in Aristotle and Plato studies on "metaphysics" (Welty and Guarino, 2001). However, the word ontology originated from two Greek words: ontos (being) and logos (word), and conceived for the first time during the Sixteen century by German philosophers (Welty and Guarino, 2001). From then till the mid-twentieth, ontology evolved mainly as a branch of philosophy. However, with the advent of Artificial Intelligence since the 1950s, researchers perceived the necessity of ontology to describe a new world of intelligent systems (Welty and Guarino, 2001). Moreover, with the development of the World Wide Web in the 1990s, ontology development got to be common among different domain specialists to define and share the concepts and entities in their fields on the Internet (Noy et al., 2001). During the last three decades, ontology development studies have evolved and shifted from theoretical issues of ontology to practical implications of the use of ontology in real-world, large-scale applications (Noy et al., 2001). Nowadays, ontology development focuses mainly on defining machine interpretable concepts and their relationships in a domain. However, ontology development also pursues other goals, such as providing a common conceptualization of the domain on which different experts agree, (Métral and Cutting-Decelle, 2011) and enable them to reuse the domain knowledge (Noy et al., 2001). It also enables researchers to easily analyze the domain knowledge and eloquently express the domain assumptions.
Having a comprehensive, high-quality dataset of road sign annotation is critical to the success of AI-based Road Sign Recognition (RSR) systems. In practice, annotators often face difficulties in learning road sign systems of different countries; hence, the tasks are often time-consuming and produce poor results. We propose a novel approach using knowledge graphs and a machine learning algorithm - variational prototyping-encoder (VPE) - to assist human annotators in classifying road signs effectively. Annotators can query the Road Sign Knowledge Graph using visual attributes and receive closest matching candidates suggested by the VPE model. The VPE model uses the candidates from the knowledge graph and a real sign image patch as inputs. We show that our knowledge graph approach can reduce sign search space by 98.9%. Furthermore, with VPE, our system can propose the correct single candidate for 75% of signs in the tested datasets, eliminating the human search effort entirely in those cases.
While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, static word vectors continue to play an important role in tasks where word meaning needs to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors for such domains, in particular related to knowledge base completion, where our focus is on capturing the semantic properties of nouns. We find that a simple strategy of averaging the contextualised embeddings of masked word mentions leads to vectors that outperform the static word vectors learned by BERT, as well as those from standard word embedding models, in property induction tasks. We notice in particular that masking target words is critical to achieve this strong performance, as the resulting vectors focus less on idiosyncratic properties and more on general semantic properties. Inspired by this view, we propose a filtering strategy which is aimed at removing the most idiosyncratic mention vectors, allowing us to obtain further performance gains in property induction.
Badenes-Olmedo, Carlos, Chaves-Fraga, David, Poveda-VillalÓn, MarÍa, Iglesias-Molina, Ana, Calleja, Pablo, Bernardos, Socorro, MartÍn-Chozas, Patricia, Fernández-Izquierdo, Alba, Amador-Domínguez, Elvira, Espinoza-Arias, Paola, Pozo, Luis, Ruckhaus, Edna, González-Guardia, Esteban, Cedazo, Raquel, López-Centeno, Beatriz, Corcho, Oscar
In the absence of sufficient medication for COVID patients due to the increased demand, disused drugs have been employed or the doses of those available were modified by hospital pharmacists. Some evidences for the use of alternative drugs can be found in the existing scientific literature that could assist in such decisions. However, exploiting large corpus of documents in an efficient manner is not easy, since drugs may not appear explicitly related in the texts and could be mentioned under different brand names. Drugs4Covid combines word embedding techniques and semantic web technologies to enable a drug-oriented exploration of large medical literature. Drugs and diseases are identified according to the ATC classification and MeSH categories respectively. More than 60K articles and 2M paragraphs have been processed from the CORD-19 corpus with information of COVID-19, SARS, and other related coronaviruses. An open catalogue of drugs has been created and results are publicly available through a drug browser, a keyword-guided text explorer, and a knowledge graph.
Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources. A critical bottleneck in the integration process involves the definition, validation, and maintenance of mappings that link data sources to a domain ontology. To support the management of mappings throughout their entire lifecycle, we propose a comprehensive catalog of sophisticated mapping patterns that emerge when linking databases to ontologies. To do so, we build on well-established methodologies and patterns studied in data management, data analysis, and conceptual modeling. These are extended and refined through the analysis of concrete VKG benchmarks and real-world use cases, and considering the inherent impedance mismatch between data sources and ontologies. We validate our catalog on the considered VKG scenarios, showing that it covers the vast majority of patterns present therein.
The World Intellectual Property Organization's (WIPO) first report of a series called WIPO Technology Trends, an extensive study of patent applications and other scientific documents, offers clues to the next big thing in AI. Rather than treating'AI' as a single homogeneous discipline (see our guide to AI terminology), the WIPO report divides it into AI techniques, AI functional applications and AI application fields, offering a finer-grained analysis. AI techniques are advanced forms of statistical and mathematical models used in AI, including machine learning, logic programming, ontology engineering, probabilistic reasoning and fuzzy logic. Machine learning is included in more than one third of all identified inventions and represents 89 per cent of AI filings, the report finds. Between 2013 and 2016, filings related to deep learning rocketed by about 175 per cent.