Zhang, Zhifei
Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models
Gao, Wanling, Huang, Yunyou, Cui, Dandan, Yu, Zhuoming, Liu, Wenjing, Liang, Xiaoshuang, Zhao, Jiahui, Xie, Jiyue, Li, Hao, Ma, Li, Ye, Ning, Kang, Yumiao, Luo, Dingfeng, Pan, Peng, Huang, Wei, Liu, Zhongmou, Hu, Jizhong, Zhao, Gangyuan, Jiang, Chongrong, Huang, Fan, Wei, Tianyi, Tang, Suqin, Xia, Bingjie, Zhang, Zhifei, Zhan, Jianfeng
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Gu, Jing, Wang, Yilin, Zhao, Nanxuan, Xiong, Wei, Liu, Qing, Zhang, Zhifei, Zhang, He, Zhang, Jianming, Jung, HyunJoon, Wang, Xin Eric
Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged. Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image. First, we propose targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping. Then, we introduce appearance adaptation, to seamlessly adapt the semantic concept into the original image in terms of target location, shape, style, and content during the image generation process. Extensive results on both human and automatic evaluation demonstrate significant improvements of our approach over baseline methods on personalized swapping. Furthermore, SwapAnything shows its precise and faithful swapping abilities across single object, multiple objects, partial object, and cross-domain swapping tasks. SwapAnything also achieves great performance on text-based swapping and tasks beyond swapping such as object insertion.
OpenClinicalAI: An Open and Dynamic Model for Alzheimer's Disease Diagnosis
Huang, Yunyou, Liang, Xiaoshuang, Lu, Xiangjiang, Miao, Xiuxia, Xie, Jiyue, Liu, Wenjing, Zhang, Fan, Kang, Guoxin, Ma, Li, Tang, Suqin, Zhang, Zhifei, Zhan, Jianfeng
Although Alzheimer's disease (AD) cannot be reversed or cured, timely diagnosis can significantly reduce the burden of treatment and care. Current research on AD diagnosis models usually regards the diagnosis task as a typical classification task with two primary assumptions: 1) All target categories are known a priori; 2) The diagnostic strategy for each patient is consistent, that is, the number and type of model input data for each patient are the same. However, real-world clinical settings are open, with complexity and uncertainty in terms of both subjects and the resources of the medical institutions. This means that diagnostic models may encounter unseen disease categories and need to dynamically develop diagnostic strategies based on the subject's specific circumstances and available medical resources. Thus, the AD diagnosis task is tangled and coupled with the diagnosis strategy formulation. To promote the application of diagnostic systems in real-world clinical settings, we propose OpenClinicalAI for direct AD diagnosis in complex and uncertain clinical settings. This is the first powerful end-to-end model to dynamically formulate diagnostic strategies and provide diagnostic results based on the subject's conditions and available medical resources. OpenClinicalAI combines reciprocally coupled deep multiaction reinforcement learning (DMARL) for diagnostic strategy formulation and multicenter meta-learning (MCML) for open-set recognition. The experimental results show that OpenClinicalAI achieves better performance and fewer clinical examinations than the state-of-the-art model. Our method provides an opportunity to embed the AD diagnostic system into the current health care system to cooperate with clinicians to improve current health care.
OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis
Huang, Yunyou, Guan, Xianglong, Lu, Xiangjiang, Liang, Xiaoshuang, Miao, Xiuxia, Xie, Jiyue, Liu, Wenjing, Ma, Li, Tang, Suqin, Zhang, Zhifei, Zhan, Jianfeng
Alzheimer's disease (AD) cannot be reversed, but early diagnosis will significantly benefit patients' medical treatment and care. In recent works, AD diagnosis has the primary assumption that all categories are known a prior -- a closed-set classification problem, which contrasts with the open-set recognition problem. This assumption hinders the application of the model in natural clinical settings. Although many open-set recognition technologies have been proposed in other fields, they are challenging to use for AD diagnosis directly since 1) AD is a degenerative disease of the nervous system with similar symptoms at each stage, and it is difficult to distinguish from its pre-state, and 2) diversified strategies for AD diagnosis are challenging to model uniformly. In this work, inspired by the concerns of clinicians during diagnosis, we propose an open-set recognition model, OpenAPMax, based on the anomaly pattern to address AD diagnosis in real-world settings. OpenAPMax first obtains the abnormal pattern of each patient relative to each known category through statistics or a literature search, clusters the patients' abnormal pattern, and finally, uses extreme value theory (EVT) to model the distance between each patient's abnormal pattern and the center of their category and modify the classification probability. We evaluate the performance of the proposed method with recent open-set recognition, where we obtain state-of-the-art results.
Photoswap: Personalized Subject Swapping in Images
Gu, Jing, Wang, Yilin, Zhao, Nanxuan, Fu, Tsu-Jui, Xiong, Wei, Liu, Qing, Zhang, Zhifei, Zhang, He, Zhang, Jianming, Jung, HyunJoon, Wang, Xin Eric
In an era where images and visual content dominate our digital landscape, the ability to manipulate and personalize these images has become a necessity. Envision seamlessly substituting a tabby cat lounging on a sunlit window sill in a photograph with your own playful puppy, all while preserving the original charm and composition of the image. We present Photoswap, a novel approach that enables this immersive image editing experience through personalized subject swapping in existing images. Photoswap first learns the visual concept of the subject from reference images and then swaps it into the target image using pre-trained diffusion models in a training-free manner. We establish that a well-conceptualized visual subject can be seamlessly transferred to any image with appropriate self-attention and cross-attention manipulation, maintaining the pose of the swapped subject and the overall coherence of the image. Comprehensive experiments underscore the efficacy and controllability of Photoswap in personalized subject swapping. Furthermore, Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality, revealing its vast application potential, from entertainment to professional editing.
Improving Diffusion Models for Scene Text Editing with Dual Encoders
Ji, Jiabao, Zhang, Guanhua, Wang, Zhaowen, Hou, Bairu, Zhang, Zhifei, Price, Brian, Chang, Shiyu
Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE
OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings
Huang, Yunyou, Wang, Nana, Tang, Suqin, Ma, Li, Hao, Tianshu, Jiang, Zihan, Zhang, Fan, Kang, Guoxin, Miao, Xiuxia, Guan, Xianglong, Zhang, Ruchang, Zhang, Zhifei, Zhan, Jianfeng
This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call closed clinical settings, but fail to work in real-world clinical settings. Compared to the diagnosis task in the closed setting, real-world clinical settings pose severe challenges, and we must treat them differently. We build a clinical AI benchmark named Clinical AIBench to set up real-world clinical settings to facilitate researches. We propose an open, dynamic machine learning framework and develop an AI system named OpenClinicalAI to diagnose diseases in real-world clinical settings. The first versions of Clinical AIBench and OpenClinicalAI target Alzheimer's disease. In the real-world clinical setting, OpenClinicalAI significantly outperforms the state-of-the-art AI system. In addition, OpenClinicalAI develops personalized diagnosis strategies to avoid unnecessary testing and seamlessly collaborates with clinicians. It is promising to be embedded in the current medical systems to improve medical services.
A new direction to promote the implementation of artificial intelligence in natural clinical settings
Huang, Yunyou, Zhang, Zhifei, Wang, Nana, Li, Nengquan, Du, Mengjia, Hao, Tianshu, Zhan, Jianfeng
These authors contributed equally to this work. Artificial intelligence (AI) researchers claim that they have made great'achievements' in clinical realms. However, clinicians point out the so-called'achievements' have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice. AI researchers claim that they have obtained many significant'achievements' in various However, in practice, most of the AI products fail to obtain approval from the Food and Drug Administration (FDA). AI devices are not qualified handling high-risk tasks such as clinical diagnosis .
r-BTN: Cross-Domain Face Composite and Synthesis From Limited Facial Patches
Song, Yang (The University of Tennessee, Knoxville) | Zhang, Zhifei (The University of Tennessee, Knoxville) | Qi, Hairong (The University of Tennessee, Knoxville)
Recent face composite and synthesis related works have shown promising results in generating realistic face images from deep convolutional networks. However, these works either do not generate consistent results when the constituent patches contain large domain variations (i.e., from face and sketch domains) or cannot generate high-resolution images with limited facial patches (e.g., the inpainting approach tends to blur the generated region when the missing area is more than 50%). Motivated by the mental imagery and simulation in human cognition, we exploit the potential of deep learning networks in filling large missing region (e.g., as high as 95% missing) and generating realistic faces with high fidelity in cross domains.We propose the recursive generation by bidirectional transformation networks (r-BTN) that recursively generates a whole face/sketch from a small sketch/face patch. The large missing area and domain variations make it difficult to generate satisfactory results using a unidirectional cross-domain learning structure. We explore that the bidirectional transformation network can lead to the consistent result by minimizing the forward and backward errors in the cross-domain scenario. On the other hand, a forward and backward bidirectional learning between the face and sketch domains would enable recursive estimation of the missing region in an incremental manner to yield appealing results. r-BTN also adopts an adversarial constraint to encourage the generation of realistic faces/sketches. Extensive experiments have been conducted to demonstrate the superior performance from r-BTN as compared to existing potential solutions.
Using Crowdsourcing to Generate Surrogate Training Data for Robotic Grasp Prediction
Unrath, Matt (Oregon State University) | Zhang, Zhifei (Oregon State University) | Goins, Alex (Oregon State University) | Carpenter, Ryan (Oregon State University) | Wong, Weng-Keen (Oregon State University) | Balasubramanian, Ravi (Oregon State University)
As an alternative to the laborious process of collecting training data from physical robotic platforms for learning robotic grasp quality prediction, we explore the use of surrogate training data from crowd-sourced evaluations of images of robotic grasps. We show that in certain regions of the grasp feature space, grasp predictors trained with this surrogate data were almost as accurate as predictors built using data from physical testing with robots.