Supervised Fine-Tuning or In-Context Learning? Evaluating LLMs for Clinical NER

Oct-28-2025–arXiv.org Artificial Intelligence

We study clinical Named Entity Recognition (NER) on the CADEC corpus and compare three families of approaches: (i) BERT-style encoders (BERT Base, BioClinicalBERT, RoBERTa-large), (ii) GPT-4o used with few-shot in-context learning (ICL) under simple vs.\ complex prompts, and (iii) GPT-4o with supervised fine-tuning (SFT). All models are evaluated on standard NER metrics over CADEC's five entity types (ADR, Drug, Disease, Symptom, Finding). RoBERTa-large and BioClinicalBERT offer limited improvements over BERT Base, showing the limit of these family of models. Among LLM settings, simple ICL outperforms a longer, instruction-heavy prompt, and SFT achieves the strongest overall performance (F1 $\approx$ 87.1%), albeit with higher cost. We find that the LLM achieve higher accuracy on simplified tasks, restricting classification to two labels.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (0.68)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.68)
  - Consumer Health (0.68)
  - Health Care Technology > Medical Record (0.47)
  - Therapeutic Area > Neurology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)