A Simplified Retriever to Improve Accuracy of Phenotype Normalizations by Large Language Models

Hier, Daniel B., Do, Thanh Son, Obafemi-Ajayi, Tayo

Sep-10-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) have shown improved accuracy in phenotype term normalization tasks when augmented with retrievers that suggest candidate normalizations based on term definitions. In this work, we introduce a simplified retriever that enhances LLM accuracy by searching the Human Phenotype Ontology (HPO) for candidate matches using contextual word embeddings from BioBERT without the need for explicit term definitions. Testing this method on terms derived from the clinical synopses of Online Mendelian Inheritance in Man (OMIM), we demonstrate that the normalization accuracy of a state-of-the-art LLM increases from a baseline of 62.3% without augmentation to 90.3% with retriever augmentation. This approach is potentially generalizable to other biomedical term normalization tasks and offers an efficient alternative to more complex retrieval methods.

large language model, machine learning, normalization, (18 more...)

arXiv.org Artificial Intelligence

Sep-10-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Francisco County
    - San Francisco (0.14)
  - Illinois (0.14)
  - Missouri (0.14)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)