Hou, Jun
Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences
Hou, Jun, Wang, Lucy Lu
Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences Jun Hou, MS 1, Lucy Lu Wang, PhD 2 1 Virginia T ech, Blacksburg, V A; 2 University of Washington, Seattle, W A Abstract Explainable AI (XAI) techniques are necessary to help clinicians make sense of AI predictions and integrate predictions into their decision-making workflow. In this work, we conduct a survey study to understand clinician preference among different XAI techniques when they are used to interpret model predictions over text-based EHR data. We implement four XAI techniques (LIME, Attention-based span highlights, exemplar patient retrieval, and free-text rationales generated by LLMs) on an outcome prediction model that uses ICU admission notes to predict a patient's likelihood of experiencing in-hospital mortality. Using these XAI implementations, we design and conduct a survey study of 32 practicing clinicians, collecting their feedback and preferences on the four techniques. We synthesize our findings into a set of recommendations describing when each of the XAI techniques may be more appropriate, their potential limitations, as well as recommendations for improvement. I NTRODUCTION Clinical decision support systems (CDSS) powered by machine learning and AI have the potential to assist in medical decisions and improve patient outcomes. However, to meaningfully support clinicians, AI-powered CDSS must be trustworthy and interpretable, allowing clinicians to assess the utility and applicability of model predictions. Explainable AI (XAI) techniques have been proposed to improve model interpretability, especially for neural network and other blackbox models. 1 While XAI techniques have been applied to CDSS, 2 a comprehensive understanding of clinician preferences and perceptions regarding XAI applications in these systems remains largely unexplored. Prior work on clinical XAI tends to focus on explanatory accuracy, in terms of which models are applicable, 3 how to integrate XAI methods for different healthcare tasks, 4 or which datasets are available to train on.
A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare
Aljohani, Manar, Hou, Jun, Kommu, Sindhura, Wang, Xuan
The application of large language models (LLMs) in healthcare has the potential to revolutionize clinical decision-making, medical research, and patient care. As LLMs are increasingly integrated into healthcare systems, several critical challenges must be addressed to ensure their reliable and ethical deployment. These challenges include truthfulness, where models generate misleading information; privacy, with risks of unintentional data retention; robustness, requiring defenses against adversarial attacks; fairness, addressing biases in clinical outcomes; explainability, ensuring transparent decision-making; and safety, mitigating risks of misinformation and medical errors. Recently, researchers have begun developing benchmarks and evaluation frameworks to systematically assess the trustworthiness of LLMs. However, the trustworthiness of LLMs in healthcare remains underexplored, lacking a systematic review that provides a comprehensive understanding and future insights into this area. This survey bridges this gap by providing a comprehensive overview of the recent research of existing methodologies and solutions aimed at mitigating the above risks in healthcare. By focusing on key trustworthiness dimensions including truthfulness, privacy and safety, robustness, fairness and bias, and explainability, we present a thorough analysis of how these issues impact the reliability and ethical use of LLMs in healthcare. This paper highlights ongoing efforts and offers insights into future research directions to ensure the safe and trustworthy deployment of LLMs in healthcare.
2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
Chen, Tianqi, Hou, Jun, Zhou, Yinchi, Xie, Huidong, Chen, Xiongchao, Liu, Qiong, Guo, Xueqi, Xia, Menghua, Duncan, James S., Liu, Chi, Zhou, Bo
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods.
SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models
Zhou, Mengqi, Hou, Jun, Luo, Chuanchen, Wang, Yuxi, Zhang, Zhaoxiang, Peng, Junran
Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a substantial gap between academic research and industrial deployment. Procedural Controllable Generation (PCG) is an efficient technique for creating scalable and high-quality assets, but it is unfriendly for ordinary users as it demands profound domain expertise. To address these issues, we resort to using the large language model (LLM) to drive the procedural modeling. In this paper, we introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.Specifically, the proposed method comprises two components, PCGBench and PCGPlanner. The former encompasses an extensive collection of accessible procedural assets and thousands of hand-craft API documents. The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions. Our SceneX can generate a city spanning 2.5 km times 2.5 km with delicate layout and geometric structures, drastically reducing the time cost from several weeks for professional PCG engineers to just a few hours for an ordinary user. Extensive experiments demonstrated the capability of our method in controllable large-scale scene generation and editing, including asset placement and season translation.
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Bai, Sikai, Zhang, Jie, Li, Shuaicheng, Guo, Song, Guo, Jingcai, Hou, Jun, Han, Tao, Lu, Xiaocheng
Federated learning (FL) has emerged as a powerful paradigm for learning from decentralized data, and federated domain generalization further considers the test dataset (target domain) is absent from the decentralized training data (source domains). However, most existing FL methods assume that domain labels are provided during training, and their evaluation imposes explicit constraints on the number of domains, which must strictly match the number of clients. Because of the underutilization of numerous edge devices and additional cross-client domain annotations in the real world, such restrictions may be impractical and involve potential privacy leaks. In this paper, we propose an efficient and novel approach, called Disentangled Prompt Tuning (DiPrompT), a method that tackles the above restrictions by learning adaptive prompts for domain generalization in a distributed manner. Specifically, we first design two types of prompts, i.e., global prompt to capture general knowledge across all clients and domain prompts to capture domain-specific knowledge. They eliminate the restriction on the one-to-one mapping between source domains and local clients. Furthermore, a dynamic query metric is introduced to automatically search the suitable domain label for each sample, which includes two-substep text-image alignments based on prompt tuning without labor-intensive annotation. Extensive experiments on multiple datasets demonstrate that our DiPrompT achieves superior domain generalization performance over state-of-the-art FL methods when domain labels are not provided, and even outperforms many centralized learning methods using domain labels.
POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map Generation
Zhou, Bo, Hou, Jun, Chen, Tianqi, Zhou, Yinchi, Chen, Xiongchao, Xie, Huidong, Liu, Qiong, Guo, Xueqi, Tsai, Yu-Jung, Panin, Vladimir Y., Toyonaga, Takuya, Duncan, James S., Liu, Chi
Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $\mu$-map generation, resulting in the production of high-quality $\mu$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods.
Understanding CNNs from excitations
Ying, Zijian, Li, Qianmu, Lian, Zhichao, Hou, Jun, Lin, Tong, Wang, Tao
Saliency maps have proven to be a highly efficacious approach for explicating the decisions of Convolutional Neural Networks. However, extant methodologies predominantly rely on gradients, which constrain their ability to explicate complex models. Furthermore, such approaches are not fully adept at leveraging negative gradient information to improve interpretive veracity. In this study, we present a novel concept, termed positive and negative excitation, which enables the direct extraction of positive and negative excitation for each layer, thus enabling complete layer-by-layer information utilization sans gradients. To organize these excitations into final saliency maps, we introduce a double-chain backpropagation procedure. A comprehensive experimental evaluation, encompassing both binary classification and multi-classification tasks, was conducted to gauge the effectiveness of our proposed method. Encouragingly, the results evince that our approach offers a significant improvement over the state-of-the-art methods in terms of salient pixel removal, minor pixel removal, and inconspicuous adversarial perturbation generation guidance. Additionally, we verify the correlation between positive and negative excitations.
FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising
Zhou, Bo, Xie, Huidong, Liu, Qiong, Chen, Xiongchao, Guo, Xueqi, Feng, Zhicheng, Hou, Jun, Zhou, S. Kevin, Li, Biao, Rominger, Axel, Shi, Kuangyu, Duncan, James S., Liu, Chi
Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutions for training a robust model is difficult due to privacy and security concerns of patient data. Moreover, low-count PET data at different institutions may have different data distribution, thus requiring personalized models. While previous federated learning (FL) algorithms enable multi-institution collaborative training without the need of aggregating local data, addressing the large domain shift in the application of multi-institutional low-count PET denoising remains a challenge and is still highly under-explored. In this work, we propose FedFTN, a personalized federated learning strategy that addresses these challenges. FedFTN uses a local deep feature transformation network (FTN) to modulate the feature outputs of a globally shared denoising network, enabling personalized low-count PET denoising for each institution. During the federated learning process, only the denoising network's weights are communicated and aggregated, while the FTN remains at the local institutions for feature transformation. We evaluated our method using a large-scale dataset of multi-institutional low-count PET imaging data from three medical centers located across three continents, and showed that FedFTN provides high-quality low-count PET images, outperforming previous baseline FL reconstruction methods across all low-count levels at all three institutions.
Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators
Bai, Sikai, Li, Shuaicheng, Zhuang, Weiming, Zhang, Jie, Guo, Song, Yang, Kunlin, Hou, Jun, Zhang, Shuai, Gao, Junyu, Yi, Shuai
Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
A Cloud-Based Energy Management Strategy for Hybrid Electric City Bus Considering Real-Time Passenger Load Prediction
Shi, Junzhe, Xu, Bin, Zhou, Xingyu, Hou, Jun
Electric city bus gains popularity in recent years for its low greenhouse gas emission, low noise level, etc. Different from a passenger car, the weight of a city bus varies significantly with different amounts of onboard passengers. After analyzing the importance of battery aging and passenger load effects on an optimal energy management strategy, this study introduces the passenger load prediction into the hybrid-electric city buses energy management problem, which is not well studied in the existing literature. The average model, Decision Tree, Gradient Boost Decision Tree, and Neural Networks models are compared in the passenger load prediction. The Gradient Boost Decision Tree model is selected due to its best accuracy and high stability. Given the predicted passenger load, a dynamic programming algorithm determines the optimal power demand for supercapacitor and battery by optimizing the battery aging and energy usage leveraging cloud techniques. Then, rule extraction is conducted on dynamic programming results, and the rule is real-time loaded to the vehicle onboard controller to handle prediction errors and uncertainties. The proposed cloud-based Dynamic Programming and rule extraction framework with the passenger load prediction show 4% and 11% lower bus operating costs in off-peak and peak hours, respectively. The operating cost by the proposed framework is less than 1% of the dynamic programming with the true passenger load information.