Wang, Yaqi
Adaptive Interactive Segmentation for Multimodal Medical Imaging via Selection Engine
Li, Zhi, Zhao, Kai, Wang, Yaqi, Wang, Shuai
In medical image analysis, achieving fast, efficient, and accurate segmentation is essential for automated diagnosis and treatment. Although recent advancements in deep learning have significantly improved segmentation accuracy, current models often face challenges in adaptability and generalization, particularly when processing multi-modal medical imaging data. These limitations stem from the substantial variations between imaging modalities and the inherent complexity of medical data. To address these challenges, we propose the Strategy-driven Interactive Segmentation Model (SISeg), built on SAM2, which enhances segmentation performance across various medical imaging modalities by integrating a selection engine. To mitigate memory bottlenecks and optimize prompt frame selection during the inference of 2D image sequences, we developed an automated system, the Adaptive Frame Selection Engine (AFSE). This system dynamically selects the optimal prompt frames without requiring extensive prior medical knowledge and enhances the interpretability of the model's inference process through an interactive feedback mechanism. We conducted extensive experiments on 10 datasets covering 7 representative medical imaging modalities, demonstrating the SISeg model's robust adaptability and generalization in multi-modal tasks. The project page and code will be available at: [URL].
SRSA: A Cost-Efficient Strategy-Router Search Agent for Real-world Human-Machine Interactions
Wang, Yaqi, Xu, Haipei
Recently, as Large Language Models (LLMs) have shown impressive emerging capabilities and gained widespread popularity, research on LLM-based search agents has proliferated. In real-world situations, users often input contextual and highly personalized queries to chatbots, challenging LLMs to capture context and generate appropriate answers. However, much of the prior research has not focused specifically on authentic human-machine dialogue scenarios. It also ignores the important balance between response quality and computational cost by forcing all queries to follow the same agent process. To address these gaps, we propose a Strategy-Router Search Agent (SRSA), routing different queries to appropriate search strategies and enabling fine-grained serial searches to obtain high-quality results at a relatively low cost. To evaluate our work, we introduce a new dataset, Contextual Query Enhancement Dataset (CQED), comprising contextual queries to simulate authentic and daily interactions between humans and chatbots. Using LLM-based automatic evaluation metrics, we assessed SRSA's performance in terms of informativeness, completeness, novelty, and actionability. To conclude, SRSA provides an approach that resolves the issue of simple serial searches leading to degenerate answers for lengthy and contextual queries, effectively and efficiently parses complex user queries, and generates more comprehensive and informative responses without fine-tuning an LLM.
Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU
Sun, Haojia, Wang, Yaqi, Zhang, Shuting
We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions about Pittsburgh and Carnegie Mellon University (CMU). We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs, achieving an inter-annotator agreement (IAA) score of 0.7625. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy. Experimental results show that the RAG system significantly outperforms a non-RAG baseline, particularly in time-sensitive and complex queries, with an F1 score improvement from 5.45% to 42.21% and recall of 56.18%. This study demonstrates the potential of RAG systems in enhancing answer precision and relevance, while identifying areas for further optimization in document retrieval and model training.
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Guo, Taicheng, Chen, Xiuying, Wang, Yaqi, Chang, Ruidi, Pei, Shichao, Chawla, Nitesh V., Wiest, Olaf, Zhang, Xiangliang
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents' capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.
Plug-and-play Shape Refinement Framework for Multi-site and Lifespan Brain Skull Stripping
Li, Yunxiang, Dan, Ruilong, Wang, Shuai, Cao, Yifan, Luo, Xiangde, Tan, Chenghao, Jia, Gangyong, Zhou, Huiyu, Zhang, You, Wang, Yaqi, Wang, Li
Skull stripping is a crucial prerequisite step in the analysis of brain magnetic resonance images (MRI). Although many excellent works or tools have been proposed, they suffer from low generalization capability. For instance, the model trained on a dataset with specific imaging parameters cannot be well applied to other datasets with different imaging parameters. Especially, for the lifespan datasets, the model trained on an adult dataset is not applicable to an infant dataset due to the large domain difference. To address this issue, numerous methods have been proposed, where domain adaptation based on feature alignment is the most common. Unfortunately, this method has some inherent shortcomings, which need to be retrained for each new domain and requires concurrent access to the input images of both domains. In this paper, we design a plug-and-play shape refinement (PSR) framework for multi-site and lifespan skull stripping. To deal with the domain shift between multi-site lifespan datasets, we take advantage of the brain shape prior, which is invariant to imaging parameters and ages. Experiments demonstrate that our framework can outperform the state-of-the-art methods on multi-site lifespan datasets.
CTooth+: A Large-scale Dental Cone Beam Computed Tomography Dataset and Benchmark for Tooth Volume Segmentation
Cui, Weiwei, Wang, Yaqi, Li, Yilong, Song, Dan, Zuo, Xingyong, Wang, Jiaojiao, Zhang, Yifan, Zhou, Huiyu, Chong, Bung san, Zeng, Liaoyuan, Zhang, Qianni
Accurate tooth volume segmentation is a prerequisite for computer-aided dental analysis. Deep learning-based tooth segmentation methods have achieved satisfying performances but require a large quantity of tooth data with ground truth. The dental data publicly available is limited meaning the existing methods can not be reproduced, evaluated and applied in clinical practice. In this paper, we establish a 3D dental CBCT dataset CTooth+, with 22 fully annotated volumes and 146 unlabeled volumes. We further evaluate several state-of-the-art tooth volume segmentation strategies based on fully-supervised learning, semi-supervised learning and active learning, and define the performance principles. This work provides a new benchmark for the tooth volume segmentation task, and the experiment can serve as the baseline for future AI-based dental imaging research and clinical application development. The codebase and dataset are released here.
Dispensed Transformer Network for Unsupervised Domain Adaptation
Li, Yunxiang, Li, Jingxiong, Dan, Ruilong, Wang, Shuai, Jin, Kai, Zeng, Guodong, Wang, Jun, Pan, Xiangji, Zhang, Qianni, Zhou, Huiyu, Jin, Qun, Wang, Li, Wang, Yaqi
Accurate segmentation is a crucial step in medical image analysis and applying supervised machine learning to segment the organs or lesions has been substantiated effective. However, it is costly to perform data annotation that provides ground truth labels for training the supervised algorithms, and the high variance of data that comes from different domains tends to severely degrade system performance over cross-site or cross-modality datasets. To mitigate this problem, a novel unsupervised domain adaptation (UDA) method named dispensed Transformer network (DTNet) is introduced in this paper. Our novel DTNet contains three modules. First, a dispensed residual transformer block is designed, which realizes global attention by dispensed interleaving operation and deals with the excessive computational cost and GPU memory usage of the Transformer. Second, a multi-scale consistency regularization is proposed to alleviate the loss of details in the low-resolution output for better feature alignment. Finally, a feature ranking discriminator is introduced to automatically assign different weights to domain-gap features to lessen the feature distribution distance, reducing the performance shift of two domains. The proposed method is evaluated on large fluorescein angiography (FA) retinal nonperfusion (RNP) cross-site dataset with 676 images and a wide used cross-modality dataset from the MM-WHS challenge. Extensive results demonstrate that our proposed network achieves the best performance in comparison with several state-of-the-art techniques.
Anatomy-Guided Parallel Bottleneck Transformer Network for Automated Evaluation of Root Canal Therapy
Li, Yunxiang, Zeng, Guodong, Zhang, Yifan, Wang, Jun, Zhang, Qianni, Jin, Qun, Sun, Lingling, Lian, Qisi, Xia, Neng, Peng, Ruizi, Tang, Kai, Wang, Yaqi, Wang, Shuai
Objective: Accurate evaluation of the root canal filling result in X-ray image is a significant step for the root canal therapy, which is based on the relative position between the apical area boundary of tooth root and the top of filled gutta-percha in root canal as well as the shape of the tooth root and so on to classify the result as correct-filling, under-filling or over-filling. Methods: We propose a novel anatomy-guided Transformer diagnosis network. For obtaining accurate anatomy-guided features, a polynomial curve fitting segmentation is proposed to segment the fuzzy boundary. And a Parallel Bottleneck Transformer network (PBT-Net) is introduced as the classification network for the final evaluation. Results, and conclusion: Our numerical experiments show that our anatomy-guided PBT-Net improves the accuracy from 40\% to 85\% relative to the baseline classification network. Comparing with the SOTA segmentation network indicates that the ASD is significantly reduced by 30.3\% through our fitting segmentation. Significance: Polynomial curve fitting segmentation has a great segmentation effect for extremely fuzzy boundaries. The prior knowledge guided classification network is suitable for the evaluation of root canal therapy greatly. And the new proposed Parallel Bottleneck Transformer for realizing self-attention is general in design, facilitating a broad use in most backbone networks.