VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

Chung, Philip, Swaminathan, Akshay, Goodell, Alex J., Kim, Yeasul, Reincke, S. Momsen, Han, Lichy, Deverett, Ben, Sadeghi, Mohammad Amin, Ariss, Abdel-Badih, Ghanem, Marc, Seong, David, Lee, Andrew A., Coombes, Caitlin E., Bradshaw, Brad, Sufian, Mahir A., Hong, Hyo Jung, Nguyen, Teresa P., Rasouli, Mohammad R., Kamra, Komal, Burbridge, Mark A., McAvoy, James C., Saffary, Roya, Ma, Stephen P., Dash, Dev, Xie, James, Wang, Ellen Y., Schmiesing, Clifford A., Shah, Nigam, Aghaeepour, Nima

Jan-27-2025–arXiv.org Artificial Intelligence

Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinician ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check text against a patient's medical record. VeriFact may accelerate the development of LLM-based EHR applications by removing current evaluation bottlenecks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Jan-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - New Jersey > Mercer County
    - Princeton (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- Asia
  - Middle East > Israel (0.04)
  - Singapore > Central Region
    - Singapore (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Performance Analysis > Accuracy (0.92)
    - Neural Networks > Deep Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found