Goto

Collaborating Authors

 Sato, Junichi


Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.


Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging

arXiv.org Artificial Intelligence

Purpose: In radiology, large language models (LLMs), including ChatGPT, have recently gained attention, and their utility is being rapidly evaluated. However, concerns have emerged regarding their reliability in clinical applications due to limitations such as hallucinations and insufficient referencing. To address these issues, we focus on the latest technology, retrieval-augmented generation (RAG), which enables LLMs to reference reliable external knowledge (REK). Specifically, this study examines the utility and reliability of a recently released RAG-equipped LLM (RAG-LLM), NotebookLM, for staging lung cancer. Materials and methods: We summarized the current lung cancer staging guideline in Japan and provided this as REK to NotebookLM. We then tasked NotebookLM with staging 100 fictional lung cancer cases based on CT findings and evaluated its accuracy. For comparison, we performed the same task using a gold-standard LLM, GPT-4 Omni (GPT-4o), both with and without the REK. Results: NotebookLM achieved 86% diagnostic accuracy in the lung cancer staging experiment, outperforming GPT-4o, which recorded 39% accuracy with the REK and 25% without it. Moreover, NotebookLM demonstrated 95% accuracy in searching reference locations within the REK. Conclusion: NotebookLM successfully performed lung cancer staging by utilizing the REK, demonstrating superior performance compared to GPT-4o. Additionally, it provided highly accurate reference locations within the REK, allowing radiologists to efficiently evaluate the reliability of NotebookLM's responses and detect possible hallucinations. Overall, this study highlights the potential of NotebookLM, a RAG-LLM, in image diagnosis.


Fast Inverse Reinforcement Learning with Interval Consistent Graph for Driving Behavior Prediction

AAAI Conferences

In contrast, Inverse reinforcement learning (IRL), inverse optimal control, a discrete approach guarantees global optimality once and imitation learning(Ng and Russell 2000; Abbeel proper discrete state space is given, hence it is more suitable and Ng 2004) are modeling frameworks for acquiring rewards for driving behavior modeling. In a discrete approach, (or cost) of a certain environment by using the optimal the calculation cost of MaxEnt IRL is O( S A), where S path under a possibly different environment as training is the number of states and A is the number of actions data. In particular, in human behavior modeling, it is (Ziebart and others 2008). That is, the key for fast prediction shown that human-centered rewards can be obtained with is suppressing the increase of S depending on dimensions maximum entropy inverse reinforcement learning (MaxEnt and preparing a necessary and sufficient action set, A, IRL)(Ziebart and others 2008), which allows suboptimal for representing driving behavior. As examples of existing training data (Huang et al. 2015; Vernaza and Bagnell 2012; discretization schemes, there are mesh grid representation Dragan and Srinivasa 2012; Walker, Gupta, and Hebert (Shimosaka, Kaneko, and Nishi 2014) and random graph 2014). For instance, Ziebart et al. (Ziebart et al. 2008) modeled based representation connected with neighbors (Byravan et the driving behavior of expert taxi drivers and enabled al. 2015). In these approaches, however, A for general dynamic driving behavior prediction based on the experts' very own systems is not trivial. This is because neighbors on experience or knowledge. MaxEnt IRL based driving behavior state space defined by Euclidean distance do not necessarily prediction, which balances safety, comfort, and economic correspond to the transition area of general dynamics performance, is very promising.