Tang, Yucheng
KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level
Deng, Ruining, Yao, Tianyuan, Tang, Yucheng, Guo, Junlin, Lu, Siqi, Xiong, Juming, Yu, Lining, Cap, Quan Huu, Cai, Pengzhou, Lan, Libin, Zhao, Ze, Galdran, Adrian, Kumar, Amit, Deotale, Gunjan, Das, Dev Kumar, Paik, Inyoung, Lee, Joonho, Lee, Geongyu, Chen, Yujia, Li, Wangkai, Li, Zhaoyang, Hou, Xuege, Wu, Zeyuan, Wang, Shengjin, Fischer, Maximilian, Kramer, Lars, Du, Anghong, Zhang, Le, Sanchez, Maria Sanchez, Ulloa, Helena Sanchez, Heredia, David Ribalta, Garcia, Carlos Perez de Arenaza, Xu, Shuoyu, He, Bingdou, Cheng, Xinping, Wang, Tao, Moreau, Noemie, Bozek, Katarzyna, Innani, Shubham, Baid, Ujjwal, Kefas, Kaura Solomon, Landman, Bennett A., Wang, Yu, Zhao, Shilin, Yin, Mengmeng, Yang, Haichun, Huo, Yuankai
Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge, introducing a dataset that incorporates preclinical rodent models of CKD with over 10,000 annotated glomeruli from 60+ Periodic Acid Schiff (PAS)-stained whole slide images. The challenge includes two tasks, patch-level segmentation and whole slide image segmentation and detection, evaluated using the Dice Similarity Coefficient (DSC) and F1-score. By encouraging innovative segmentation methods that adapt to diverse CKD models and tissue conditions, the KPIs Challenge aims to advance kidney pathology analysis, establish new benchmarks, and enable precise, large-scale quantification for disease research and diagnosis.
Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines
Qu, Chongyu, Zhao, Ritchie, Yu, Ye, Liu, Bin, Yao, Tianyuan, Zhu, Junchao, Landman, Bennett A., Tang, Yucheng, Huo, Yuankai
Quantizing deep neural networks ,reducing the precision (bit-width) of their computations, can remarkably decrease memory usage and accelerate processing, making these models more suitable for large-scale medical imaging applications with limited computational resources. However, many existing methods studied "fake quantization", which simulates lower precision operations during inference, but does not actually reduce model size or improve real-world inference speed. Moreover, the potential of deploying real 3D low-bit quantization on modern GPUs is still unexplored. In this study, we introduce a real post-training quantization (PTQ) framework that successfully implements true 8-bit quantization on state-of-the-art (SOTA) 3D medical segmentation models, i.e., U-Net, SegResNet, SwinUNETR, nnU-Net, UNesT, TransUNet, ST-UNet,and VISTA3D. Our approach involves two main steps. First, we use TensorRT to perform fake quantization for both weights and activations with unlabeled calibration dataset. Second, we convert this fake quantization into real quantization via TensorRT engine on real GPUs, resulting in real-world reductions in model size and inference latency. Extensive experiments demonstrate that our framework effectively performs 8-bit quantization on GPUs without sacrificing model performance. This advancement enables the deployment of efficient deep learning models in medical imaging applications where computational resources are constrained. The code and models have been released, including U-Net, TransUNet pretrained on the BTCV dataset for abdominal (13-label) segmentation, UNesT pretrained on the Whole Brain Dataset for whole brain (133-label) segmentation, and nnU-Net, SegResNet, SwinUNETR and VISTA3D pretrained on TotalSegmentator V2 for full body (104-label) segmentation. https://github.com/hrlblab/PTQ.
ETA-IK: Execution-Time-Aware Inverse Kinematics for Dual-Arm Systems
Tang, Yucheng, Huang, Xi, Zhang, Yongzhou, Chen, Tao, Mamaev, Ilshat, Hein, Björn
This paper presents ETA-IK, a novel Execution-Time-Aware Inverse Kinematics method tailored for dual-arm robotic systems. The primary goal is to optimize motion execution time by leveraging the redundancy of both arms, specifically in tasks where only the relative pose of the robots is constrained, such as dual-arm scanning of unknown objects. Unlike traditional inverse kinematics methods that use surrogate metrics such as joint configuration distance, our method incorporates direct motion execution time and implicit collisions into the optimization process, thereby finding target joints that allow subsequent trajectory generation to get more efficient and collision-free motion. A neural network based execution time approximator is employed to predict time-efficient joint configurations while accounting for potential collisions. Through experimental evaluation on a system composed of a UR5 and a KUKA iiwa robot, we demonstrate significant reductions in execution time. The proposed method outperforms conventional approaches, showing improved motion efficiency without sacrificing positioning accuracy. These results highlight the potential of ETA-IK to improve the performance of dual-arm systems in applications, where efficiency and safety are paramount.
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Bassi, Pedro R. A. S., Li, Wenxuan, Tang, Yucheng, Isensee, Fabian, Wang, Zifu, Chen, Jieneng, Chou, Yu-Cheng, Kirchhoff, Yannick, Rokuss, Maximilian, Huang, Ziyan, Ye, Jin, He, Junjun, Wald, Tassilo, Ulrich, Constantin, Baumgartner, Michael, Roy, Saikat, Maier-Hein, Klaus H., Jaeger, Paul, Ye, Yiwen, Xie, Yutong, Zhang, Jianpeng, Chen, Ziyang, Xia, Yong, Xing, Zhaohu, Zhu, Lei, Sadegheih, Yousef, Bozorgpour, Afshin, Kumari, Pratibha, Azad, Reza, Merhof, Dorit, Shi, Pengcheng, Ma, Ting, Du, Yuxin, Bai, Fan, Huang, Tiejun, Zhao, Bo, Wang, Haonan, Li, Xiaomeng, Gu, Hanxue, Dong, Haoyu, Yang, Jichen, Mazurowski, Maciej A., Gupta, Saumya, Wu, Linshan, Zhuang, Jiaxin, Chen, Hao, Roth, Holger, Xu, Daguang, Blaschko, Matthew B., Decherchi, Sergio, Cavalli, Andrea, Yuille, Alan L., Zhou, Zongwei
How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a consequence, good performance on standard benchmarks does not guarantee success in real-world scenarios. To address these problems, we present Touchstone, a large-scale collaborative segmentation benchmark of 9 types of abdominal organs. This benchmark is based on 5,195 training CT scans from 76 hospitals around the world and 5,903 testing CT scans from 11 additional hospitals. This diverse test set enhances the statistical significance of benchmark results and rigorously evaluates AI algorithms across various out-of-distribution scenarios. We invited 14 inventors of 19 AI algorithms to train their algorithms, while our team, as a third party, independently evaluated these algorithms on three test sets. In addition, we also evaluated pre-existing AI frameworks--which, differing from algorithms, are more flexible and can support different algorithms--including MONAI from NVIDIA, nnU-Net from DKFZ, and numerous other open-source frameworks. We are committed to expanding this benchmark to encourage more innovation of AI algorithms for the medical domain.
AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks
Qu, Chongyu, Zhang, Tiezheng, Qiao, Hualin, Liu, Jie, Tang, Yucheng, Yuille, Alan, Zhou, Zongwei
Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous revision of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.
All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning
Cui, Can, Deng, Ruining, Liu, Quan, Yao, Tianyuan, Bao, Shunxing, Remedios, Lucas W., Tang, Yucheng, Huo, Yuankai
The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.
Feasibility of Universal Anomaly Detection without Knowing the Abnormality in Medical Images
Cui, Can, Wang, Yaohong, Bao, Shunxing, Tang, Yucheng, Deng, Ruining, Remedios, Lucas W., Asad, Zuhayr, Roland, Joseph T., Lau, Ken S., Liu, Qi, Coburn, Lori A., Wilson, Keith T., Landman, Bennett A., Huo, Yuankai
Many anomaly detection approaches, especially deep learning methods, have been recently developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the training process, the abnormal images were often employed during the validation process (e.g., epoch selection, hyper-parameter tuning), which might leak the supposed ``unknown" abnormality unintentionally. In this study, we investigated these two essential aspects regarding universal anomaly detection in medical images by (1) comparing various anomaly detection methods across four medical datasets, (2) investigating the inevitable but often neglected issues on how to unbiasedly select the optimal anomaly detection model during the validation phase using only normal images, and (3) proposing a simple decision-level ensemble method to leverage the advantage of different kinds of anomaly detection without knowing the abnormality. The results of our experiments indicate that none of the evaluated methods consistently achieved the best performance across all datasets. Our proposed method enhanced the robustness of performance in general (average AUC 0.956).
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
Liu, Jie, Zhang, Yixiao, Chen, Jie-Neng, Xiao, Junfei, Lu, Yongyi, Landman, Bennett A., Yuan, Yixuan, Yuille, Alan, Tang, Yucheng, Zhou, Zongwei
An increasing number of public datasets have shown a marked impact on automated organ segmentation and tumor detection. However, due to the small size and partially labeled problem of each dataset, as well as a limited investigation of diverse types of tumors, the resulting models are often limited to segmenting specific organs/tumors and ignore the semantics of anatomical structures, nor can they be extended to novel domains. To address these issues, we propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models. This CLIP-based label encoding captures anatomical relationships, enabling the model to learn a structured feature embedding and segment 25 organs and 6 types of tumors. The proposed model is developed from an assembly of 14 datasets, using a total of 3,410 CT scans for training and then evaluated on 6,162 external CT scans from 3 additional datasets. We rank first on the Medical Segmentation Decathlon (MSD) public leaderboard and achieve state-of-the-art results on Beyond The Cranial Vault (BTCV). Additionally, the Universal Model is computationally more efficient (6x faster) compared with dataset-specific models, generalized better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks.
MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images
Diaz-Pinto, Andres, Alle, Sachidanand, Nath, Vishwesh, Tang, Yucheng, Ihsani, Alvin, Asad, Muhammad, Pérez-García, Fernando, Mehta, Pritesh, Li, Wenqi, Flores, Mona, Roth, Holger R., Vercauteren, Tom, Xu, Daguang, Dogra, Prerna, Ourselin, Sebastien, Feng, Andrew, Cardoso, M. Jorge
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the time required to annotate radiology datasets. Through MONAI Label, researchers can develop AI annotation applications focusing on their domain of expertise. It allows researchers to readily deploy their apps as services, which can be made available to clinicians via their preferred user interface. Currently, MONAI Label readily supports locally installed (3D Slicer) and web-based (OHIF) frontends and offers two active learning strategies to facilitate and speed up the training of segmentation algorithms. MONAI Label allows researchers to make incremental improvements to their AI-based annotation application by making them available to other researchers and clinicians alike. Additionally, MONAI Label provides sample AI-based interactive and non-interactive labeling applications, that can be used directly off the shelf, as plug-and-play to any given dataset. Significant reduced annotation times using the interactive model can be observed on two public datasets.
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
Tang, Yucheng, Yang, Dong, Li, Wenqi, Roth, Holger, Landman, Bennett, Xu, Daguang, Nath, Vishwesh, Hatamizadeh, Ali
Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pre-training; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The effectiveness of our approach is validated by fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV) Segmentation Challenge with 13 abdominal organs and segmentation tasks from the Medical Segmentation Decathlon (MSD) dataset. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets. Code: https://monai.io/research/swin-unetr