Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Wang, Xiao, Wang, Fuling, Wang, Haowen, Jiang, Bo, Li, Chuanfu, Wang, Yaowei, Tian, Yonghong, Tang, Jin

Jan-6-2025–arXiv.org Artificial Intelligence

Abstract--X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases. In this paper, we propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports. It considers both the mining of global and local visual information and associates historical report information to better complete the writing of the current report. Some researchers already exploit the effectiveness of LLM in the X-ray based medical report generation, such as R2Gen-GPT [1], R2Gen-I. This task can greatly alleviate the work pressure on high-quality text at the linguistic level, but they struggle to doctors and reduce the waiting time for patients, providing accurately identify abnormal conditions, diseases, and other a feasible method for empowering artificial intelligence in critical information in clinical diagnostic indicators. Although the task has made considerable result, although the obtained medical reports may appear to be progress in recent years, there are still many issues, such well-structured, they are actually difficult to address the practical as the difficulty in detecting key diseases and the challenge problems. In MRG, models typically need to process two shown in Figure 1, our framework contains two stages, i.e., the primary sources of information: visual information from medical disease-aware visual token mining and the associative memory images and linguistic information from existing medical augmented X-ray medical report generation. R2Gen [9] introduces a memory-driven the first stage, we extract the vision features of a given X-Transformer for radiology report generation, using relational ray image using the Swin Transformer network [4].

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jan-6-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Anhui Province > Hefei (0.04)
    - Beijing > Beijing (0.04)
    - Guangdong Province > Shenzhen (0.04)
    - Heilongjiang Province > Harbin (0.04)
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
- Europe > Italy
  - Lombardy > Milan (0.04)
- North America > United States (0.14)

Genre:
- Research Report
  - New Finding (0.67)
  - Promising Solution (0.46)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (1.00)
  - Health Care Technology (1.00)
  - Nuclear Medicine (0.91)
  - Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)