BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment

Sep-10-2025–arXiv.org Artificial Intelligence

In recent years, there has been substantial progress in using pretrained Language Models (LMs) on a range of tasks aimed at improving the understanding of biomedical texts. Nonetheless, existing biomedical LLMs show limited comprehension of complex, domain-specific concept structures and the factual information encoded in biomedical Knowledge Graphs (KGs). In this work, we propose BALI (Biomedical Knowledge Graph and Language Model Alignment), a novel joint LM and KG pre-training method that augments an LM with external knowledge by the simultaneous learning of a dedicated KG encoder and aligning the representations of both the LM and the graph. For a given textual sequence, we link biomedical concept mentions to the Unified Medical Language System (UMLS) KG and utilize local KG subgraphs as cross-modal positive samples for these mentions. Our empirical findings indicate that implementing our method on several leading biomedical LMs, such as PubMedBERT and BioLinkBERT, improves their performance on a range of language understanding tasks and the quality of entity representations, even with minimal pre-training on a small alignment dataset sourced from PubMed scientific abstracts.

computational linguistic, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Sep-10-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.28)
- Asia > Indonesia
  - Bali (0.66)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Semantic Networks (0.92)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found