medical image report generation
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. We propose a novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning-based approaches to achieve structured, robust, and diverse report generation. HRGR-Agent employs a hierarchical decision-making procedure. For each sentence, a high-level retrieval policy module chooses to either retrieve a template sentence from an off-the-shelf template database, or invoke a low-level generation module to generate a new sentence. HRGR-Agent is updated via reinforcement learning, guided by sentence-level and word-level rewards. Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents. In addition, our model achieves the highest detection precision of medical abnormality terminologies, and improved human evaluation performance.
Interview with Flávia Carvalhido: Responsible multimodal AI
In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. In this latest interview, we hear from Flávia Carvalhido who is a PhD student at the University of Porto. We find out about her work on responsible multimodal AI, what inspired her to study AI, and how she found the Doctoral Consortium experience. My PhD programme is on Informatics Engineering in the Faculty of Engineering at the University of Porto, where I also got both my Bachelor's and Master's in the same field. My thesis research project is focused on responsible multimodal AI, titled "Stress Testing of Image-Text Multimodal Models in Medical Image Report Generation", supervised by Professor Henrique Lopes Cardoso and Professor Vítor Cerqueira and developed in the LIACC research laboratory.
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation
Li, Siyou, Xu, Beining, Luo, Yihao, Nie, Dong, Zhang, Le
Automatic medical report generation (MRG), which aims to produce detailed text reports from medical images, has emerged as a critical task in this domain. MRG systems can enhance radiological workflows by reducing the time and effort required for report writing, thereby improving diagnostic efficiency. In this work, we present a novel approach for automatic MRG utilizing a multimodal large language model. Specifically, we employed the 3D Vision Transformer (ViT3D) image encoder introduced from M3D-CLIP to process 3D scans and use the Asclepius-Llama3-8B as the language model to generate the text reports by auto-regressive decoding. The experiment shows our model achieved an average Green score of 0.3 on the MRG task validation set and an average accuracy of 0.61 on the visual question answering (VQA) task validation set, outperforming the baseline model. Our approach demonstrates the effectiveness of the ViT3D alignment of LLaMA3 for automatic MRG and VQA tasks by tuning the model on a small dataset.
- Europe > United Kingdom > England > Greater London > London (0.05)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
- (2 more...)
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. We propose a novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning-based approaches to achieve structured, robust, and diverse report generation. HRGR-Agent employs a hierarchical decision-making procedure. For each sentence, a high-level retrieval policy module chooses to either retrieve a template sentence from an off-the-shelf template database, or invoke a low-level generation module to generate a new sentence. HRGR-Agent is updated via reinforcement learning, guided by sentence-level and word-level rewards.
Reviews: Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
Summary The paper presents an approach using hierarchical reinforcement learning to address the problem of automatically generating medical reports using diagnostics images. The approach first predicts a sequence of hidden states for each sentence, and deicdes when to stop, and a low level model takes the hidden state and either retrieves a sentence and uses it as an output or passes control to a generator which generates a sentence. The overall system is trained with rewards at both sentence level as well as word-level for generation. The proposed approach shows promise over ablations of the proposed model as well as some sensible baseline CNN-RNN based approaches for image captioning. Strengths Paper provides the experimental details of the setup quite thoroughly. Paper clearly mentions the hyperparameters used for training.
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
Li, Yuan, Liang, Xiaodan, Hu, Zhiting, Xing, Eric P.
Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. We propose a novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning-based approaches to achieve structured, robust, and diverse report generation. HRGR-Agent employs a hierarchical decision-making procedure. For each sentence, a high-level retrieval policy module chooses to either retrieve a template sentence from an off-the-shelf template database, or invoke a low-level generation module to generate a new sentence. HRGR-Agent is updated via reinforcement learning, guided by sentence-level and word-level rewards.