Inspecting state of the art performance and NLP metrics in image-based medical report generation

Pino, Pablo, Parra, Denis, Messina, Pablo, Besa, Cecilia, Uribe, Sergio

Nov-21-2020–arXiv.org Artificial Intelligence

Several deep learning architectures have been proposed over the last years to deal with the problem of generating a written report given an imaging exam as input. Most works evaluate the generated reports using standard Natural Language Processing (NLP) metrics (e.g. BLEU, ROUGE), reporting significant progress. In this article, we contrast this progress by comparing state of the art (SOTA) models against weak baselines. We show that simple and even naive approaches yield near SOTA performance on most traditional NLP metrics. We conclude that evaluation methods in this task should be further studied towards correctly measuring clinical accuracy, ideally involving physicians to contribute to this end.

baseline, medical report generation, nlp metric, (10 more...)

arXiv.org Artificial Intelligence

Nov-21-2020

arXiv.org PDF

Add feedback

Country:
- South America > Chile (0.05)
- North America > Canada
  - British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (0.73)
  - Nuclear Medicine (0.49)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.76)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found