Self-alignment Pre-training for Biomedical Entity Representations

Liu, Fangyu, Shareghi, Ehsan, Meng, Zaiqiao, Basaldella, Marco, Collier, Nigel

Oct-22-2020–arXiv.org Artificial Intelligence

Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. Addressing this is of paramount importance for tasks such as entity linking where complex relational knowledge is pivotal. We propose SapBERT, a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts. Our experimental results on six medical entity linking benchmarking datasets demonstrate that SapBERT outperforms many domain-specific BERT-based variants such as BioBERT, BlueBERT and PubMedBERT, achieving the state-of-the-art (SOTA) performances.

dataset, health & medicine, text processing, (19 more...)

arXiv.org Artificial Intelligence

Oct-22-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Japan (0.14)
- Europe
  - Belgium (0.14)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Text Processing (0.93)
  - Representation & Reasoning > Ontologies (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found