AITopics | Zhou, Hong-Yu

Collaborating Authors

Zhou, Hong-Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Zhang, Xiaoman, Zhou, Hong-Yu, Yang, Xiaoli, Banerjee, Oishi, Acosta, Julián N., Miller, Josh, Huang, Ouwen, Rajpurkar, Pranav

arXiv.org Artificial IntelligenceNov-22-2024

AI-driven models have demonstrated significant potential in automating radiology report generation for chest X-rays. However, there is no standardized benchmark for objectively evaluating their performance. To address this, we present ReXrank, https://rexrank.ai, a public leaderboard and challenge for assessing AI-powered radiology report generation. Our framework incorporates ReXGradient, the largest test dataset consisting of 10,000 studies, and three public datasets (MIMIC-CXR, IU-Xray, CheXpert Plus) for report generation assessment. ReXrank employs 8 evaluation metrics and separately assesses models capable of generating only findings sections and those providing both findings and impressions sections. By providing this standardized evaluation framework, ReXrank enables meaningful comparisons of model performance and offers crucial insights into their robustness across diverse clinical settings. Beyond its current focus on chest X-rays, ReXrank's framework sets the stage for comprehensive evaluation of automated reporting across the full spectrum of medical imaging.

ai-powered radiology report generation, artificial intelligence, public leaderboard, (1 more...)

arXiv.org Artificial Intelligence

2411.15122

Genre: Research Report (0.40)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.80)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

HeadCT-ONE: Enabling Granular and Controllable Automated Evaluation of Head CT Radiology Report Generation

Acosta, Julián N., Zhang, Xiaoman, Dogra, Siddhant, Zhou, Hong-Yu, Payabvash, Seyedmehdi, Falcone, Guido J., Oermann, Eric K., Rajpurkar, Pranav

arXiv.org Artificial IntelligenceSep-19-2024

We present Head CT Ontology Normalized Evaluation (HeadCT-ONE), a metric for evaluating head CT report generation through ontology-normalized entity and relation extraction. HeadCT-ONE enhances current information extraction derived metrics (such as RadGraph F1) by implementing entity normalization through domain-specific ontologies, addressing radiological language variability. HeadCT-ONE compares normalized entities and relations, allowing for controllable weighting of different entity types or specific entities. Through experiments on head CT reports from three health systems, we show that HeadCT-ONE's normalization and weighting approach improves the capture of semantically equivalent reports, better distinguishes between normal and abnormal reports, and aligns with radiologists' assessment of clinically significant errors, while offering flexibility to prioritize specific aspects of report content. Our results demonstrate how HeadCT-ONE enables more flexible, controllable, and granular automated evaluation of head CT reports.

machine learning, natural language, radiology report, (19 more...)

arXiv.org Artificial Intelligence

2409.13038

Country:

Asia (0.28)
Europe > Switzerland > Geneva > Geneva (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Uncertainty Estimation of Large Language Models in Medical Question Answering

Wu, Jiaxin, Yu, Yizhou, Zhou, Hong-Yu

arXiv.org Artificial IntelligenceJul-11-2024

Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. Deploying LLMs for medical question answering necessitates reliable uncertainty estimation (UE) methods to detect hallucinations. In this work, we benchmark popular UE methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications. We also observe that larger models tend to yield better results, suggesting a correlation between model size and the reliability of UE. To address these challenges, we propose Two-phase Verification, a probability-free Uncertainty Estimation approach. First, an LLM generates a step-by-step explanation alongside its initial answer, followed by formulating verification questions to check the factual claims in the explanation. The model then answers these questions twice: first independently, and then referencing the explanation. Inconsistencies between the two sets of answers measure the uncertainty in the original response. We evaluate our approach on three biomedical question-answering datasets using Llama 2 Chat models and compare it against the benchmarked baseline methods. The results show that our Two-phase Verification method achieves the best overall accuracy and stability across various datasets and model sizes, and its performance scales as the model size increases.

large language model, machine learning, verification question, (16 more...)

arXiv.org Artificial Intelligence

2407.08662

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

Banerjee, Oishi, Zhou, Hong-Yu, Adithan, Subathra, Kwak, Stephen, Wu, Kay, Rajpurkar, Pranav

arXiv.org Artificial IntelligenceJun-14-2024

Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrained VLMs performing radiology report generation by suppressing unwanted types of generations. We apply our method to the prevention of hallucinations of prior exams, addressing a long-established problem behavior in models performing chest X-ray report generation. Across our experiments, we find that DPO fine-tuning achieves a 3.2-4.8x reduction in lines hallucinating prior exams while maintaining model performance on clinical accuracy metrics. Our work is, to the best of our knowledge, the first work to apply DPO to medical VLMs, providing a data- and compute- efficient way to suppress problem behaviors while maintaining overall clinical accuracy.

exam, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.06496

Country: Asia (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging

Lou, Meng, Ying, Hanning, Liu, Xiaoqing, Zhou, Hong-Yu, Zhang, Yuqing, Yu, Yizhou

arXiv.org Artificial IntelligenceFeb-27-2024

Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) to process multi-phase imaging inputs, possessing robust feature representations while maintaining computational efficiency. The weight-sharing feature of the SNN is further enriched by a hybrid Dual-Resolution Transformer (DR-Former), comprising a 3D Convolutional Neural Network (CNN) and a tailored 3D Transformer for processing high- and low-resolution images, respectively. This hybrid sub-architecture excels in capturing detailed local features and understanding global contextual information, thereby, boosting the SNN's feature extraction capabilities. Additionally, a novel Adaptive Phase Selection Module (APSM) is introduced, promoting phase-specific intercommunication and dynamically adjusting each phase's influence on the diagnostic outcome. The proposed SDR-Former framework has been validated through comprehensive experiments on two clinical datasets: a three-phase CT dataset and an eight-phase MR dataset. The experimental results affirm the efficacy of the proposed framework. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public. This pioneering dataset, being the first publicly available multi-phase MR dataset in this field, also underpins the MICCAI LLD-MMRI Challenge. The dataset is accessible at:https://bit.ly/3IyYlgN.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.17246

Country: Asia > China > Zhejiang Province (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Liu, Jiarun, Yang, Hao, Zhou, Hong-Yu, Xi, Yan, Yu, Lequan, Yu, Yizhou, Liang, Yong, Shi, Guangming, Zhang, Shaoting, Zheng, Hairong, Wang, Shanshan

arXiv.org Artificial IntelligenceFeb-5-2024

Accurate medical image segmentation demands the integration of multi-scale information, spanning from local features to global dependencies. However, it is challenging for existing methods to model long-range global information, where convolutional neural networks (CNNs) are constrained by their local receptive fields, and vision transformers (ViTs) suffer from high quadratic complexity of their attention mechanism. Recently, Mamba-based models have gained great attention for their impressive ability in long sequence modeling. Several studies have demonstrated that these models can outperform popular vision models in various tasks, offering higher accuracy, lower memory consumption, and less computational burden. However, existing Mamba-based models are mostly trained from scratch and do not explore the power of pretraining, which has been proven to be quite effective for data-efficient medical image analysis. This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks, leveraging the advantages of ImageNet-based pretraining. Our experimental results reveal the vital role of ImageNet-based training in enhancing the performance of Mamba-based models. Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba outperforms its closest counterpart U-Mamba by an average score of 3.58%. The code and models of Swin-UMamba are publicly available at: https://github.com/JiarunLiu/Swin-UMamba

artificial intelligence, machine learning, swin-umamba, (17 more...)

arXiv.org Artificial Intelligence

2402.03302

Country:

Europe (0.68)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

Zheng, Ge, Yang, Bin, Tang, Jiajin, Zhou, Hong-Yu, Yang, Sibei

arXiv.org Artificial IntelligenceOct-26-2023

A long-standing goal of AI systems is to perform complex multimodal reasoning like humans. Recently, large language models (LLMs) have made remarkable strides in such multi-step reasoning on the language modality solely by leveraging the chain of thought (CoT) to mimic human thinking. However, the transfer of these advancements to multimodal contexts introduces heightened challenges, including but not limited to the impractical need for labor-intensive annotation and the limitations in terms of flexibility, generalizability, and explainability. To evoke CoT reasoning in multimodality, this work first conducts an in-depth analysis of these challenges posed by multimodality and presents two key insights: "keeping critical thinking" and "letting everyone do their jobs" in multimodal CoT reasoning. Furthermore, this study proposes a novel DDCoT prompting that maintains a critical attitude through negative-space prompting and incorporates multimodality into reasoning by first dividing the reasoning responsibility of LLMs into reasoning and recognition and then integrating the visual recognition capability of visual models into the joint reasoning process. The rationales generated by DDCoT not only improve the reasoning abilities of both large and small language models in zero-shot prompting and fine-tuning learning, significantly outperforming state-of-the-art methods but also exhibit impressive generalizability and explainability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.16436

Country:

North America > United States (0.46)
Asia > China (0.28)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Activate and Reject: Towards Safe Domain Generalization under Category Shift

Chen, Chaoqi, Tang, Luyao, Tao, Leitian, Zhou, Hong-Yu, Huang, Yue, Han, Xiaoguang, Yu, Yizhou

arXiv.org Artificial IntelligenceOct-7-2023

Albeit the notable performance on in-domain test points, it is non-trivial for deep neural networks to attain satisfactory accuracy when deploying in the open world, where novel domains and object classes often occur. In this paper, we study a practical problem of Domain Generalization under Category Shift (DGCS), which aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains. Compared to prior DG works, we face two new challenges: 1) how to learn the concept of ``unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments for safe model deployment. To this end, we propose a novel Activate and Reject (ART) framework to reshape the model's decision boundary to accommodate unknown classes and conduct post hoc modification to further discriminate known and unknown classes using unlabeled test data. Specifically, during training, we promote the response to the unknown by optimizing the unknown probability and then smoothing the overall output to mitigate the overconfidence issue. At test time, we introduce a step-wise online adaptation method that predicts the label by virtue of the cross-domain nearest neighbor and class prototype information without updating the network's parameters or using threshold-based mechanisms. Experiments reveal that ART consistently improves the generalization capability of deep networks on different vision tasks. For image classification, ART improves the H-score by 6.1% on average compared to the previous best method. For object detection and semantic segmentation, we establish new benchmarks and achieve competitive performance.

artificial intelligence, generalization, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2310.04724

Country:

Asia > China > Guangdong Province (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

Zhou, Hong-Yu, Yu, Yizhou, Wang, Chengdi, Zhang, Shu, Gao, Yuanxu, Pan, Jia, Shao, Jun, Lu, Guangming, Zhang, Kang, Li, Weimin

arXiv.org Artificial IntelligenceJun-1-2023

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results. Deep-learning models for aiding diagnosis have yet to meet this requirement. Here we report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. Rather than learning modality-specific features, the model uses embedding layers to convert images and unstructured and structured text into visual tokens and text tokens, and bidirectional blocks with intramodal and intermodal attention to learn a holistic representation of radiographs, the unstructured chief complaint and clinical history, structured clinical information such as laboratory-test results and patient demographic information. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases (by 12% and 9%, respectively) and in the prediction of adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively). Leveraging unified multimodal Transformer-based models may help streamline triage of patients and facilitate the clinical decision process.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2306.00864

Country: Asia > China > Guangdong Province (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling

Zhou, Hong-Yu, Fu, Yunxiang, Zhang, Zhicheng, Bian, Cheng, Yu, Yizhou

arXiv.org Artificial IntelligenceFeb-15-2023

Protein representation learning has primarily benefited from the remarkable development of language models (LMs). Accordingly, pre-trained protein models also suffer from a problem in LMs: a lack of factual knowledge. The recent solution models the relationships between protein and associated knowledge terms as the knowledge encoding objective. However, it fails to explore the relationships at a more granular level, i.e., the token level. To mitigate this, we propose Knowledge-exploited Auto-encoder for Protein (KeAP), which performs tokenlevel knowledge graph exploration for protein representation learning. In practice, non-masked amino acids iteratively query the associated knowledge tokens to extract and integrate helpful information for restoring masked amino acids via attention. We show that KeAP can consistently outperform the previous counterpart on 9 representative downstream applications, sometimes surpassing it by large margins. These results suggest that KeAP provides an alternative yet effective way to perform knowledge enhanced protein representation learning. Code and models are available at https://github.com/RL4M/KeAP.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.13154

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback