Liu, Chi
CRM: Retrieval Model with Controllable Condition
Liu, Chi, Cao, Jiangxia, Huang, Rui, Cai, Kuo, Ding, Weifeng, Luo, Qiang, Gai, Kun, Zhou, Guorui
Recommendation systems (RecSys) are designed to connect users with relevant items from a vast pool of candidates while aligning with the business goals of the platform. A typical industrial RecSys is composed of two main stages, retrieval and ranking: (1) the retrieval stage aims at searching hundreds of item candidates satisfied user interests; (2) based on the retrieved items, the ranking stage aims at selecting the best dozen items by multiple targets estimation for each item candidate, including classification and regression targets. Compared with ranking model, the retrieval model absence of item candidate information during inference, therefore retrieval models are often trained by classification target only (e.g., click-through rate), but failed to incorporate regression target (e.g., the expected watch-time), which limit the effectiveness of retrieval. In this paper, we propose the Controllable Retrieval Model (CRM), which integrates regression information as conditional features into the two-tower retrieval paradigm. This modification enables the retrieval stage could fulfill the target gap with ranking model, enhancing the retrieval model ability to search item candidates satisfied the user interests and condition effectively. We validate the effectiveness of CRM through real-world A/B testing and demonstrate its successful deployment in Kuaishou short-video recommendation system, which serves over 400 million users.
KuaiFormer: Transformer-Based Retrieval at Kuaishou
Liu, Chi, Cao, Jiangxia, Huang, Rui, Zheng, Kai, Luo, Qiang, Gai, Kun, Zhou, Guorui
In large-scale content recommendation systems, retrieval serves as the initial stage in the pipeline, responsible for selecting thousands of candidate items from billions of options to pass on to ranking modules. Traditionally, the dominant retrieval method has been Embedding-Based Retrieval (EBR) using a Deep Neural Network (DNN) dual-tower structure. However, applying transformer in retrieval tasks has been the focus of recent research, though real-world industrial deployment still presents significant challenges. In this paper, we introduce KuaiFormer, a novel transformer-based retrieval framework deployed in a large-scale content recommendation system. KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks (such as click-through rate estimate) to a transformer-driven Next Action Prediction paradigm. This shift enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance. KuaiFormer has been successfully integrated into Kuaishou App's short-video recommendation system since May 2024, serving over 400 million daily active users and resulting in a marked increase in average daily usage time of Kuaishou users. We provide insights into both the technical and business aspects of deploying transformer in large-scale recommendation systems, addressing practical challenges encountered during industrial implementation. Our findings offer valuable guidance for engineers and researchers aiming to leverage transformer models to optimize large-scale content recommendation systems.
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook
Yang, Meng, Zhu, Tianqing, Liu, Chi, Zhou, WanLei, Yu, Shui, Yu, Philip S.
Thanks to the explosive growth of data and the development of computational resources, it is possible to build pre-trained models that can achieve outstanding performance on various tasks, such as neural language processing, computer vision, and more. Despite their powerful capabilities, pre-trained models have also sparked attention to the emerging security challenges associated with their real-world applications. Security and privacy issues, such as leaking privacy information and generating harmful responses, have seriously undermined users' confidence in these powerful models. Concerns are growing as model performance improves dramatically. Researchers are eager to explore the unique security and privacy issues that have emerged, their distinguishing factors, and how to defend against them. However, the current literature lacks a clear taxonomy of emerging attacks and defenses for pre-trained models, which hinders a high-level and comprehensive understanding of these questions. To fill the gap, we conduct a systematical survey on the security risks of pre-trained models, proposing a taxonomy of attack and defense methods based on the accessibility of pre-trained models' input and weights in various security test scenarios. This taxonomy categorizes attacks and defenses into No-Change, Input-Change, and Model-Change approaches. With the taxonomy analysis, we capture the unique security and privacy issues of pre-trained models, categorizing and summarizing existing security issues based on their characteristics. In addition, we offer a timely and comprehensive review of each category's strengths and limitations. Our survey concludes by highlighting potential new research opportunities in the security and privacy of pre-trained models.
2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
Chen, Tianqi, Hou, Jun, Zhou, Yinchi, Xie, Huidong, Chen, Xiongchao, Liu, Qiong, Guo, Xueqi, Xia, Menghua, Duncan, James S., Liu, Chi, Zhou, Bo
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods.
Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024)
Rahmim, Arman, Bradshaw, Tyler J., Davidzon, Guido, Dutta, Joyita, Fakhri, Georges El, Ghesani, Munir, Karakatsanis, Nicolas A., Li, Quanzheng, Liu, Chi, Roncali, Emilie, Saboury, Babak, Yusufaly, Tahir, Jha, Abhinav K.
Arman Rahmim Departments of Radiology and Physics, University of British Columbia Tyler J. Bradshaw Department of Radiology, University of Wisconsin Guido Davidzon Department of Radiology, Division of Nuclear Medicine & Molecular Imaging, Stanford University Joyita Dutta Department of Biomedical Engineering, University of Massachusetts Amherst Georges El Fakhri PET Center, Departments of Radiology & Biomedical Engineering and Bioinformatics & Data Science, Yale University Munir Ghesani United Theranostics Nicolas A. Karakatsanis Department of Radiology, Weill Cornell Medical College of Cornell University, New York Quanzheng Li Center for Advanced Medical Computing and Analysis, Department of Radiology, Massachusetts General Hospital, Harvard Medical School Chi Liu Department of Radiology and Biomedical Imaging, Yale University Emilie Roncali Departments of Biomedical Engineering and Radiology, University of California, Davis Babak Saboury Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health Tahir Yusufaly Russell H. Morgan Department of Radiology and Radiological Sciences, Johns Hopkins School of Medicine Abhinav K. Jha Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University, St. Louis Abstract The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was "AI in Action". Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript. Introduction The Society of Nuclear Medicine & Molecular Imaging (SNMMI) 2nd Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Over 100 community members and stakeholders from academia, healthcare, industry, and NIH gathered to discuss the emerging role of AI in nuclear medicine. It featured two plenaries, panel discussions, talks from leading experts in the field, and was concluded by a round table discussion on key findings, next steps, and call to action.
Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification
Wang, Haochun, Zhao, Sendong, Liu, Chi, Xi, Nuwa, Cai, Muzhen, Qin, Bing, Liu, Ting
Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Meanwhile, the distance between high-dimensional verbalizer embeddings should not be measured by Euclidean distance due to the potential for non-linear manifolds in the representation space. In this study, we propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) for verbalizer embeddings, which preserves local properties within the same class as guidance for classification. Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLaMA-7B&13B indicate that LLE-INC is an efficient tuning-free classification approach for the hyper-scale language models.
POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map Generation
Zhou, Bo, Hou, Jun, Chen, Tianqi, Zhou, Yinchi, Chen, Xiongchao, Xie, Huidong, Liu, Qiong, Guo, Xueqi, Tsai, Yu-Jung, Panin, Vladimir Y., Toyonaga, Takuya, Duncan, James S., Liu, Chi
Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $\mu$-map generation, resulting in the production of high-quality $\mu$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods.
FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising
Zhou, Bo, Xie, Huidong, Liu, Qiong, Chen, Xiongchao, Guo, Xueqi, Feng, Zhicheng, Hou, Jun, Zhou, S. Kevin, Li, Biao, Rominger, Axel, Shi, Kuangyu, Duncan, James S., Liu, Chi
Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutions for training a robust model is difficult due to privacy and security concerns of patient data. Moreover, low-count PET data at different institutions may have different data distribution, thus requiring personalized models. While previous federated learning (FL) algorithms enable multi-institution collaborative training without the need of aggregating local data, addressing the large domain shift in the application of multi-institutional low-count PET denoising remains a challenge and is still highly under-explored. In this work, we propose FedFTN, a personalized federated learning strategy that addresses these challenges. FedFTN uses a local deep feature transformation network (FTN) to modulate the feature outputs of a globally shared denoising network, enabling personalized low-count PET denoising for each institution. During the federated learning process, only the denoising network's weights are communicated and aggregated, while the FTN remains at the local institutions for feature transformation. We evaluated our method using a large-scale dataset of multi-institutional low-count PET imaging data from three medical centers located across three continents, and showed that FedFTN provides high-quality low-count PET images, outperforming previous baseline FL reconstruction methods across all low-count levels at all three institutions.
TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction
Guo, Xueqi, Shi, Luyao, Chen, Xiongchao, Zhou, Bo, Liu, Qiong, Xie, Huidong, Liu, Yi-Hwa, Palyo, Richard, Miller, Edward J., Sinusas, Albert J., Spottiswoode, Bruce, Liu, Chi, Dvornek, Nicha C.
The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN.
Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions
Xie, Huidong, Zhou, Bo, Chen, Xiongchao, Guo, Xueqi, Thorn, Stephanie, Liu, Yi-Hwa, Wang, Ge, Sinusas, Albert, Liu, Chi
Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.