Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Chen, Zijian, Gamble, John-Michael, Jantzi, Micaela, Hirdes, John P., Lin, Jimmy

Dec-10-2024–arXiv.org Artificial Intelligence

Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves 78% exact match accuracy with GPT-4o and 60% with Llama 3.1 70B. We investigate knowledge grounding through drug definitions, finding modest improvements in accuracy. Further, we show that fine-tuned Llama 3.1 8B matches zero-shot Llama 3.1 70B accuracy, suggesting that effective ATC coding is feasible with smaller models. Our results demonstrate the feasibility of automatic ATC coding in privacy-sensitive healthcare environments, providing a foundation for future deployments.

large language model, machine learning, product name, (19 more...)

arXiv.org Artificial Intelligence

Dec-10-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario (0.48)
  - United States > Washington
    - King County > Seattle (0.04)

Genre:
- Research Report > New Finding (0.69)

Industry:
- Information Technology > Security & Privacy (0.87)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Health Care Providers & Services (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)