rek
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
Li, Pengteng, Song, Pinhao, Li, Wuyang, Guo, Weiyu, Yao, Huizai, Xu, Yijie, Liu, Dugang, Xiong, Hui
We introduce SEE&TREK, the first training-free prompting framework tailored to enhance the spatial understanding of Multimodal Large Language Models (MLLMS) under vision-only constraints. While prior efforts have incorporated modalities like depth or point clouds to improve spatial reasoning, purely visualspatial understanding remains underexplored. SEE&TREK addresses this gap by focusing on two core principles: increasing visual diversity and motion reconstruction. For visual diversity, we conduct Maximum Semantic Richness Sampling, which employs an off-the-shell perception model to extract semantically rich keyframes that capture scene structure. For motion reconstruction, we simulate visual trajectories and encode relative spatial positions into keyframes to preserve both spatial relations and temporal coherence. Our method is training&GPU-free, requiring only a single forward pass, and can be seamlessly integrated into existing MLLM'S. Extensive experiments on the VSI-B ENCH and STI-B ENCH show that S EE &T REK consistently boosts various MLLM S performance across diverse spatial reasoning tasks with the most +3.5% improvement, offering a promising path toward stronger spatial intelligence.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation
Johno, Hisashi, Johno, Yuki, Amakawa, Akitomo, Sato, Junichi, Tozuka, Ryota, Komaba, Atsushi, Watanabe, Hiroaki, Watanabe, Hiroki, Goto, Chihiro, Morisaka, Hiroyuki, Onishi, Hiroshi, Nakamoto, Kazunori
Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.
- Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging
Tozuka, Ryota, Johno, Hisashi, Amakawa, Akitomo, Sato, Junichi, Muto, Mizuki, Seki, Shoichiro, Komaba, Atsushi, Onishi, Hiroshi
Purpose: In radiology, large language models (LLMs), including ChatGPT, have recently gained attention, and their utility is being rapidly evaluated. However, concerns have emerged regarding their reliability in clinical applications due to limitations such as hallucinations and insufficient referencing. To address these issues, we focus on the latest technology, retrieval-augmented generation (RAG), which enables LLMs to reference reliable external knowledge (REK). Specifically, this study examines the utility and reliability of a recently released RAG-equipped LLM (RAG-LLM), NotebookLM, for staging lung cancer. Materials and methods: We summarized the current lung cancer staging guideline in Japan and provided this as REK to NotebookLM. We then tasked NotebookLM with staging 100 fictional lung cancer cases based on CT findings and evaluated its accuracy. For comparison, we performed the same task using a gold-standard LLM, GPT-4 Omni (GPT-4o), both with and without the REK. Results: NotebookLM achieved 86% diagnostic accuracy in the lung cancer staging experiment, outperforming GPT-4o, which recorded 39% accuracy with the REK and 25% without it. Moreover, NotebookLM demonstrated 95% accuracy in searching reference locations within the REK. Conclusion: NotebookLM successfully performed lung cancer staging by utilizing the REK, demonstrating superior performance compared to GPT-4o. Additionally, it provided highly accurate reference locations within the REK, allowing radiologists to efficiently evaluate the reliability of NotebookLM's responses and detect possible hallucinations. Overall, this study highlights the potential of NotebookLM, a RAG-LLM, in image diagnosis.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Lithuania > Kaunas County > Kaunas (0.04)
- (2 more...)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.47)
Study shows how large language models like GPT-3 can learn a new task from just a few examples
Large language models like OpenAI's GPT-3 are massive neural networks that can generate human-like text, from poetry to programming code. Trained using troves of internet data, these machine-learning models take a small bit of input text and then predict the text that is likely to come next. But that's not all these models can do. Researchers are exploring a curious phenomenon known as in-context learning, in which a large language model learns to accomplish a task after seeing only a few examples--despite the fact that it wasn't trained for that task. For instance, someone could feed the model several example sentences and their sentiments (positive or negative), then prompt it with a new sentence, and the model can give the correct sentiment.
- North America > Canada > Alberta (0.15)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
TimberTrek: Exploring and Curating Sparse Decision Trees with Interactive Visualization
Wang, Zijie J., Zhong, Chudi, Xin, Rui, Takagi, Takuya, Chen, Zhi, Chau, Duen Horng, Rudin, Cynthia, Seltzer, Margo
Given thousands of equally accurate machine learning (ML) models, how can users choose among them? A recent ML technique enables domain experts and data scientists to generate a complete Rashomon set for sparse decision trees--a huge set of almost-optimal interpretable ML models. To help ML practitioners identify models with desirable properties from this Rashomon set, we develop TimberTrek, the first interactive visualization system that summarizes thousands of sparse decision trees at scale. Two usage scenarios highlight how TimberTrek can empower users to easily explore, compare, and curate models that align with their domain knowledge and values. Our open-source tool runs directly in users' computational notebooks and web browsers, lowering the barrier to creating more responsible ML models. TimberTrek is available at the following public demo link: https://poloclub.github.io/timbertrek.
Rana el Kaliouby on teaching computers to read our emotions
Amy Barrett: So Girl Decoded was published earlier this year by Penguin Business. Can you tell me, what is your book about? Rana el Kaliouby: So my book is a memoir. It's a juxtaposition of my personal journey intertwined with my journey building emotional intelligence into technology. AB: What made you actually want to start writing it? ReK: So the initial idea was to talk about emotion A.I. or artificial emotional intelligence and kind of tease apart the different applications of the technology and the ethical and moral implications of building technology like that. But very early on, I remember meeting with the publisher Penguin, Random House, and the editor there said, you know, your story is really fascinating. I grew up in the Middle East, found my way to the US by way of studying in the UK, actually. Ane he said, that's the story, you got to interweave your personal stories. So it ended up being this, again, kind of inter woven mix of my personal background and how I went from what I call "a nice Egyptian girl" to a CEO of a tech company. AB: And what some of the biggest challenges you say you faced to getting where you are today? ReK: I think the biggest kind of challenge is that I was always kind of doing some… I'm a misfit. Like, I grew up in the Middle East, but I really wanted to be a computer scientist. I left home to do my PhD, which was quite unusual at the time because my husband at the time had to stay back in Cairo for work.
- Europe > Middle East (0.45)
- Asia > Middle East (0.45)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.25)
- (2 more...)