Self-alignment Pre-training for Biomedical Entity Representations
Liu, Fangyu, Shareghi, Ehsan, Meng, Zaiqiao, Basaldella, Marco, Collier, Nigel
–arXiv.org Artificial Intelligence
Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. Addressing this is of paramount importance for tasks such as entity linking where complex relational knowledge is pivotal. We propose SapBERT, a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts. Our experimental results on six medical entity linking benchmarking datasets demonstrate that SapBERT outperforms many domain-specific BERT-based variants such as BioBERT, BlueBERT and PubMedBERT, achieving the state-of-the-art (SOTA) performances.
arXiv.org Artificial Intelligence
Oct-22-2020
- Country:
- Asia > Japan (0.14)
- Europe
- Belgium (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Genre:
- Research Report > Experimental Study (0.46)
- Industry:
- Technology: