BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment
Sakhovskiy, Andrey, Tutubalina, Elena
–arXiv.org Artificial Intelligence
In recent years, there has been substantial progress in using pretrained Language Models (LMs) on a range of tasks aimed at improving the understanding of biomedical texts. Nonetheless, existing biomedical LLMs show limited comprehension of complex, domain-specific concept structures and the factual information encoded in biomedical Knowledge Graphs (KGs). In this work, we propose BALI (Biomedical Knowledge Graph and Language Model Alignment), a novel joint LM and KG pre-training method that augments an LM with external knowledge by the simultaneous learning of a dedicated KG encoder and aligning the representations of both the LM and the graph. For a given textual sequence, we link biomedical concept mentions to the Unified Medical Language System (UMLS) KG and utilize local KG subgraphs as cross-modal positive samples for these mentions. Our empirical findings indicate that implementing our method on several leading biomedical LMs, such as PubMedBERT and BioLinkBERT, improves their performance on a range of language understanding tasks and the quality of entity representations, even with minimal pre-training on a small alignment dataset sourced from PubMed scientific abstracts.
arXiv.org Artificial Intelligence
Sep-10-2025
- Country:
- Asia
- Europe
- Greece > Central Macedonia
- Thessaloniki (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > Scotland
- City of Glasgow > Glasgow (0.04)
- Greece > Central Macedonia
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Diego County > San Diego (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.28)
- Nevada (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.14)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Technology: