BioLORD: Learning Ontological Representations from Definitions (for Biomedical Concepts and their Textual Descriptions)
Remy, François, Demuynck, Kris, Demeester, Thomas
–arXiv.org Artificial Intelligence
This work introduces BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. State-of-the-art methodologies operate by maximizing the similarity in representation of names referring to the same concept, and preventing collapse through contrastive learning. However, because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations. BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph consisting of biomedical ontologies. Thanks to this grounding, our model produces more semantic concept representations that match more closely the hierarchical structure of ontologies. BioLORD establishes a new state of the art for text similarity on both clinical sentences (MedSTS) and biomedical concepts (MayoSRS).
arXiv.org Artificial Intelligence
Oct-21-2022
- Country:
- North America > Canada (0.04)
- Europe
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Spain > Catalonia
- Asia > China
- Hong Kong (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Technology: