AITopics

2305.1407

Country:

South America > Colombia > Meta Department > Villavicencio (0.05)
Africa > Kenya > Mandera County > Mandera (0.04)
South America > Brazil > Rio Grande do Sul (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.46)
Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Fanourakis, Nikolaos, Efthymiou, Vasilis, Kotzinos, Dimitris, Christophides, Vassilis

Knowledge Graph Embedding Methods for Entity Alignment: An Experimental Review

arXiv.org Artificial IntelligenceJun-7-2023

In recent years, we have witnessed the proliferation of knowledge graphs (KG) in various domains, aiming to support applications like question answering, recommendations, etc. A frequent task when integrating knowledge from different KGs is to find which subgraphs refer to the same real-world entity. Recently, embedding methods have been used for entity alignment tasks, that learn a vector-space representation of entities which preserves their similarity in the original KGs. A wide variety of supervised, unsupervised, and semi-supervised methods have been proposed that exploit both factual (attribute based) and structural information (relation based) of entities in the KGs. Still, a quantitative assessment of their strengths and weaknesses in real-world KGs according to different performance metrics and KG characteristics is missing from the literature. In this work, we conduct the first meta-level analysis of popular embedding methods for entity alignment, based on a statistically sound methodology. Our analysis reveals statistically significant correlations of different embedding methods with various meta-features extracted by KGs and rank them in a statistically significant way according to their effectiveness across all real-world KGs of our testbed. Finally, we study interesting trade-offs in terms of methods' effectiveness and efficiency.

artificial intelligence, machine learning, natural language, (18 more...)

2203.0928

Country:

Europe > Germany (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
North America > United States > California > Marin County > San Rafael (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.45)
Research Report > Experimental Study (0.45)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.61)

arXiv.org Artificial IntelligenceJun-6-2023

BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs

Daza, Daniel, Alivanistos, Dimitrios, Mitra, Payal, Pijnenburg, Thom, Cochez, Michael, Groth, Paul

Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. We find settings involving low degree entities, which make up for a substantial amount of the set of entities in the KG, where our method outperforms the baselines. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. Our implementation is available at https://github.com/elsevier-AI-Lab/BioBLP .

artificial intelligence, machine learning, natural language, (20 more...)

2306.03606

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJun-6-2023

Logic Diffusion for Knowledge Graph Reasoning

Xie, Xiaoying, Gong, Biao, Lv, Yiliang, Han, Zhen, Zhao, Guoshuai, Qian, Xueming

Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions. However, existing reasoning models are limited by the circumscribed logical paradigms of training samples, which leads to a weak generalization of unseen logic. To address these issues, we propose a plug-in module called Logic Diffusion (LoD) to discover unseen queries from surroundings and achieves dynamical equilibrium between different kinds of patterns. The basic idea of LoD is relation diffusion and sampling sub-logic by random walking as well as a special training mechanism called gradient adaption. Besides, LoD is accompanied by a novel loss function to further achieve the robust logical diffusion when facing noisy data in training or testing sets. Extensive experiments on four public datasets demonstrate the superiority of mainstream knowledge graph reasoning models with LoD over state-of-the-art. Moreover, our ablation study proves the general effectiveness of LoD on the noise-rich knowledge graph.

artificial intelligence, graph, knowledge graph, (13 more...)

2306.03515

Country: Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.94)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

A Unified Knowledge Graph Augmentation Service for Boosting Domain-specific NLP Tasks

Ding, Ruiqing, Han, Xiao, Wang, Leye

By focusing the pre-training process on domain-specific corpora, some domain-specific pre-trained language models (PLMs) have achieved state-of-the-art results. However, it is under-investigated to design a unified paradigm to inject domain knowledge in the PLM fine-tuning stage. We propose KnowledgeDA, a unified domain language model development service to enhance the task-specific training procedure with domain knowledge graphs. Given domain-specific task texts input, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on domain-specific text classification and QA tasks verify the effectiveness and generalizability of KnowledgeDA.

knowledgeda, large language model, machine learning, (21 more...)

2212.05251

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.82)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs

Sun, Zequn, Huang, Jiacheng, Lin, Jinghao, Xu, Xiaozhou, Chen, Qijin, Hu, Wei

In this paper, we present the ``joint pre-training and local re-training'' framework for learning and applying multi-source knowledge graph (KG) embeddings. We are motivated by the fact that different KGs contain complementary information to improve KG embeddings and downstream tasks. We pre-train a large teacher KG embedding model over linked multi-source KGs and distill knowledge to train a student model for a task-specific KG. To enable knowledge transfer across different KGs, we use entity alignment to build a linked subgraph for connecting the pre-trained KGs and the target KG. The linked subgraph is re-trained for three-level knowledge distillation from the teacher to the student, i.e., feature knowledge distillation, network knowledge distillation, and prediction knowledge distillation, to generate more expressive embeddings. The teacher model can be reused for different target KGs and tasks without having to train from scratch. We conduct extensive experiments to demonstrate the effectiveness and efficiency of our framework.

artificial intelligence, machine learning, natural language, (17 more...)

2306.02679

Country:

North America > United States > California > Los Angeles County > Long Beach (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.67)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)

What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Sun, Zequn, Huang, Jiacheng, Xu, Xiaozhou, Chen, Qijin, Ren, Weijun, Hu, Wei

Joint representation learning over multi-sourced knowledge graphs (KGs) yields transferable and expressive embeddings that improve downstream tasks. Entity alignment (EA) is a critical step in this process. Despite recent considerable research progress in embedding-based EA, how it works remains to be explored. In this paper, we provide a similarity flooding perspective to explain existing translation-based and aggregation-based EA models. We prove that the embedding learning process of these models actually seeks a fixpoint of pairwise similarities between entities. We also provide experimental evidence to support our theoretical analysis. We propose two simple but effective methods inspired by the fixpoint computation in similarity flooding, and demonstrate their effectiveness on benchmark datasets. Our work bridges the gap between recent embedding-based models and the conventional similarity flooding algorithm. It would improve our understanding of and increase our faith in embedding-based EA.

alignment, artificial intelligence, machine learning, (14 more...)

2306.02622

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)

Literature-based Discovery for Landscape Planning

Marasco, David, Tyagin, Ilya, Sybrandt, Justin, Spencer, James H., Safro, Ilya

This project demonstrates how medical corpus hypothesis generation, a knowledge discovery field of AI, can be used to derive new research angles for landscape and urban planners. The hypothesis generation approach herein consists of a combination of deep learning with topic modeling, a probabilistic approach to natural language analysis that scans aggregated research databases for words that can be grouped together based on their subject matter commonalities; the word groups accordingly form topics that can provide implicit connections between two general research terms. The hypothesis generation system AGATHA was used to identify likely conceptual relationships between emerging infectious diseases (EIDs) and deforestation, with the objective of providing landscape planners guidelines for productive research directions to help them formulate research hypotheses centered on deforestation and EIDs that will contribute to the broader health field that asserts causal roles of landscape-level issues. This research also serves as a partial proof-of-concept for the application of medical database hypothesis generation to medicine-adjacent hypothesis discovery. Keywords deforestation, emerging infectious disease, hypothesis generation, landscape planning, topic modeling Funding This research was funded by National Science Foundation grant numbers 1633608 and 2027864. The authors express their gratitude to the NSF for its generous support of their research. Introduction The recent COVID-19 crisis has put the issue of emerging infectious diseases (EIDs) back in the global spotlight. Addressing EIDs going forward will require widespread interdisciplinary cooperation, as discouraging them is a multifaceted and omnipresent endeavor. Biologists and health experts have frequently asserted that landscape-level issues drive EIDs (e.g.

data mining, deforestation, machine learning, (21 more...)

2306.02588

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Africa > Cameroon (0.04)
South America > Bolivia > Cochabamba Department > Cercado Province (Cochabamba) > Cochabamba (0.04)
(5 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.34)

arXiv.org Artificial IntelligenceJun-3-2023

Shrinking Embeddings for Hyper-Relational Knowledge Graphs

Xiong, Bo, Nayyer, Mojtaba, Pan, Shirui, Staab, Steffen

Link prediction on knowledge graphs (KGs) has been extensively studied on binary relational KGs, wherein each fact is represented by a triple. A significant amount of important knowledge, however, is represented by hyper-relational facts where each fact is composed of a primal triple and a set of qualifiers comprising a key-value pair that allows for expressing more complicated semantics. Although some recent works have proposed to embed hyper-relational KGs, these methods fail to capture essential inference patterns of hyper-relational facts such as qualifier monotonicity, qualifier implication, and qualifier mutual exclusion, limiting their generalization capability. To unlock this, we present \emph{ShrinkE}, a geometric hyper-relational KG embedding method aiming to explicitly model these patterns. ShrinkE models the primal triple as a spatial-functional transformation from the head into a relation-specific box. Each qualifier ``shrinks'' the box to narrow down the possible answer set and, thus, realizes qualifier monotonicity. The spatial relationships between the qualifier boxes allow for modeling core inference patterns of qualifiers such as implication and mutual exclusion. Experimental results demonstrate ShrinkE's superiority on three benchmarks of hyper-relational KGs.

artificial intelligence, machine learning, qualifier, (17 more...)

2306.02199

Country:

Asia > Russia (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
(4 more...)

Genre:

Research Report (0.70)
Overview (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)

Arnold, Stefan, Yesilbas, Dilara, Weinzierl, Sven

Driving Context into Text-to-Text Privatization

arXiv.org Artificial IntelligenceJun-2-2023

\textit{Metric Differential Privacy} enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such as \textit{'bank'}. To account for these ambiguous words, we leverage a sense embedding and incorporate a sense disambiguation step prior to noise injection. We encompass our modification to the privatization mechanism with an estimation of privacy and utility. For word sense disambiguation on the \textit{Words in Context} dataset, we demonstrate a substantial increase in classification accuracy by $6.05\%$.

artificial intelligence, machine learning, natural language, (18 more...)

2306.01457

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)