Zero-shot Clinical Entity Recognition using ChatGPT
Hu, Yan, Ameer, Iqra, Zuo, Xu, Peng, Xueqing, Zhou, Yujia, Li, Zehan, Li, Yiming, Li, Jianfu, Jiang, Xiaoqian, Xu, Hua
–arXiv.org Artificial Intelligence
We noticed that ChatGPT struggled to extract co-reference entities like "her medications" or "her symptoms", which should be annotated in accordance with the 2010 i2b2 annotation guidelines, for coreference identification purposes. After we removed those co-reference entities in the gold standard and re-evaluated the performance of both ChatGPT and GPT-3, we observed modest increases in performance, with ChatGPT achieving an F1 score of 0.628 using Prompt-2 and GPT-3 attaining an F1 score of 0.500 in the relaxed-match criteria. Moreover, we observed a significant degree of randomness in ChatGPT's output. Even when presented with the same prompt and the same input text, it sometimes generated responses with considerable differences in format and content. This phenomenon was particularly prevalent when the input note was lengthy, despite our efforts to minimize input sequence length by limiting it to the HPI section. We anticipate this issue will be addressed when GPT-4 allows much longer text. Although it is not clear whether clinical corpora (and what types of clinical corpora) are used in training ChatGPT, ChatGPT has demonstrated its understanding of the medical text to a certain degree. We believe fine-tuning ChatGPT with domain-specific corpora, assuming OpenAI will provide such an API, will further improve its performance on clinical NLP tasks such as NER in the zero-shot fashion.
arXiv.org Artificial Intelligence
May-15-2023
- Country:
- North America > United States (0.29)
- Genre:
- Research Report > New Finding (0.95)
- Industry:
- Technology: