AITopics | Zhu, Qingqing

Collaborating Authors

Zhu, Qingqing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Demystifying Large Language Models for Medicine: A Primer

Jin, Qiao, Wan, Nicholas, Leaman, Robert, Tian, Shubo, Wang, Zhizheng, Yang, Yifan, Wang, Zifeng, Xiong, Guangzhi, Lai, Po-Ting, Zhu, Qingqing, Hou, Benjamin, Sarfo-Gyamfi, Maame, Zhang, Gongbo, Gilson, Aidan, Bhasuran, Balu, He, Zhe, Zhang, Aidong, Sun, Jimeng, Weng, Chunhua, Summers, Ronald M., Chen, Qingyu, Peng, Yifan, Lu, Zhiyong

arXiv.org Artificial IntelligenceNov-19-2024

Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices. This approach consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and deployment. We start with the discussion of critical considerations in identifying healthcare tasks that align with the core capabilities of LLMs and selecting models based on the selected task and data, performance requirements, and model interface. We then review the strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs to specialized medical tasks. Deployment considerations, including regulatory compliance, ethical guidelines, and continuous monitoring for fairness and bias, are also discussed. By providing a structured step-by-step methodology, this tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice, ensuring that these powerful technologies are applied in a safe, reliable, and impactful manner.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.18856

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Illinois > Champaign County (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.89)
Health & Medicine > Health Care Technology > Medical Record (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare

Yang, Yifan, Jin, Qiao, Zhu, Qingqing, Wang, Zhizheng, Álvarez, Francisco Erramuspe, Wan, Nicholas, Hou, Benjamin, Lu, Zhiyong

arXiv.org Artificial IntelligenceOct-24-2024

Large Language Models (LLMs) have gained significant attention in the medical domain for their human-level capabilities, leading to increased efforts to explore their potential in various healthcare applications. However, despite such a promising future, there are multiple challenges and obstacles that remain for their real-world uses in practical settings. This work discusses key challenges for LLMs in medical applications from four unique aspects: operational vulnerabilities, ethical and social considerations, performance and assessment difficulties, and legal and regulatory compliance. Addressing these challenges is crucial for leveraging LLMs to their full potential and ensuring their responsible integration into healthcare.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.1846

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analyses

Zhu, Qingqing, Hou, Benjamin, Mathai, Tejas S., Mukherjee, Pritam, Jin, Qiao, Chen, Xiuying, Wang, Zhizheng, Cheng, Ruida, Summers, Ronald M., Lu, Zhiyong

arXiv.org Artificial IntelligenceJun-18-2024

Automatically interpreting CT scans can ease the workload of radiologists. However, this is challenging mainly due to the scarcity of adequate datasets and reference standards for evaluation. This study aims to bridge this gap by introducing a novel evaluation framework, named ``GPTRadScore''. This framework assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, in generating descriptions for prospectively-identified findings. By employing a decomposition technique based on GPT-4, GPTRadScore compares these generated descriptions with gold-standard report sentences, analyzing their accuracy in terms of body part, location, and type of finding. Evaluations demonstrated a high correlation with clinician assessments and highlighted its potential over traditional metrics, such as BLEU, METEOR, and ROUGE. Furthermore, to contribute to future studies, we plan to release a benchmark dataset annotated by clinicians. Using GPTRadScore, we found that while GPT-4V and Gemini Pro Vision fare better, their performance revealed significant areas for improvement, primarily due to limitations in the dataset used for training these models. To demonstrate this potential, RadFM was fine-tuned and it resulted in significant accuracy improvements: location accuracy rose from 3.41\% to 12.8\%, body part accuracy from 29.12\% to 53\%, and type accuracy from 9.24\% to 30\%, thereby validating our hypothesis.

large language model, lesion, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2403.0568

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization

Chen, Xiuying, Gao, Shen, Li, Mingzhe, Zhu, Qingqing, Gao, Xin, Zhang, Xiangliang

arXiv.org Artificial IntelligenceJun-8-2024

Nowadays, neural text generation has made tremendous progress in abstractive summarization tasks. However, most of the existing summarization models take in the whole document all at once, which sometimes cannot meet the needs in practice. Practically, social text streams such as news events and tweets keep growing from time to time, and can only be fed to the summarization system step by step. Hence, in this paper, we propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed. The appended summary should not only summarize the newly added content but also be coherent with the previous summary, to form an up-to-date complete summary. To tackle this challenge, we design an adversarial learning model, named Stepwise Summary Generator (SSG). First, SSG selectively processes the new document under the guidance of the previous summary, obtaining polished document representation. Next, SSG generates the summary considering both the previous summary and the document. Finally, a convolutional-based discriminator is employed to determine whether the newly generated summary is coherent with the previous summary. For the experiment, we extend the traditional two-step update summarization setting to a multi-step stepwise setting, and re-propose a large-scale stepwise summarization dataset based on a public story generation dataset. Extensive experiments on this dataset show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations. Ablation studies demonstrate the effectiveness of each module in our framework. We also discuss the benefits and limitations of recent large language models on this task.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.05361

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Flexible and Adaptable Summarization via Expertise Separation

Chen, Xiuying, Li, Mingzhe, Gao, Shen, Cheng, Xin, Zhu, Qingqing, Yan, Rui, Gao, Xin, Zhang, Xiangliang

arXiv.org Artificial IntelligenceJun-8-2024

A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests on the principle that the general summarization ability to capture salient information can be shared across different tasks, while the domain-specific summarization abilities need to be distinct and tailored. Concretely, we propose MoeSumm, a Mixture-of-Expert Summarization architecture, which utilizes a main expert for gaining the general summarization capability and deputy experts that selectively collaborate to meet specific summarization task requirements. We further propose a max-margin loss to stimulate the separation of these abilities. Our model's distinct separation of general and domain-specific summarization abilities grants it with notable flexibility and adaptability, all while maintaining parameter efficiency. MoeSumm achieves flexibility by managing summarization across multiple domains with a single model, utilizing a shared main expert and selected deputy experts. It exhibits adaptability by tailoring deputy experts to cater to out-of-domain few-shot and zero-shot scenarios. Experimental results on 11 datasets show the superiority of our model compared with recent baselines and LLMs. We also provide statistical and visual evidence of the distinct separation of the two abilities in MoeSumm (https://github.com/iriscxy/MoE_Summ).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.0536

Country:

North America > United States (0.48)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases

Wang, Zhizheng, Jin, Qiao, Wei, Chih-Hsuan, Tian, Shubo, Lai, Po-Ting, Zhu, Qingqing, Day, Chi-Ping, Ross, Christina, Lu, Zhiyong

arXiv.org Artificial IntelligenceMay-25-2024

Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification capability. It autonomously interacts with various biological databases and leverages relevant domain knowledge to improve accuracy and reduce hallucination occurrences. Benchmarking on 1,106 gene sets from different sources, GeneAgent consistently outperforms standard GPT-4 by a significant margin. Moreover, a detailed manual review confirms the effectiveness of the self-verification module in minimizing hallucinations and generating more reliable analytical narratives. To demonstrate its practical utility, we apply GeneAgent to seven novel gene sets derived from mouse B2905 melanoma cell lines, with expert evaluations showing that GeneAgent offers novel insights into gene functions and subsequently expedites knowledge discovery.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.16205

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark

Chen, Xiuying, Wang, Tairan, Zhu, Qingqing, Guo, Taicheng, Gao, Shen, Lu, Zhiyong, Gao, Xin, Zhang, Xiangliang

arXiv.org Artificial IntelligenceFeb-22-2024

The summarization capabilities of pretrained and large language models (LLMs) have been widely validated in general areas, but their use in scientific corpus, which involves complex sentences and specialized knowledge, has been less assessed. This paper presents conceptual and experimental analyses of scientific summarization, highlighting the inadequacies of traditional evaluation methods, such as $n$-gram, embedding comparison, and QA, particularly in providing explanations, grasping scientific concepts, or identifying key content. Subsequently, we introduce the Facet-aware Metric (FM), employing LLMs for advanced semantic matching to evaluate summaries based on different aspects. This facet-aware approach offers a thorough evaluation of abstracts by decomposing the evaluation task into simpler subtasks.Recognizing the absence of an evaluation benchmark in this domain, we curate a Facet-based scientific summarization Dataset (FD) with facet-level annotations. Our findings confirm that FM offers a more logical approach to evaluating scientific summaries. In addition, fine-tuned smaller models can compete with LLMs in scientific contexts, while LLMs have limitations in learning from in-context information in scientific domains. This suggests an area for future enhancement of LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.14359

Country:

North America > United States (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (0.69)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning

Jin, Qiao, Wang, Zhizheng, Yang, Yifan, Zhu, Qingqing, Wright, Donald, Huang, Thomas, Wilbur, W John, He, Zhe, Taylor, Andrew, Chen, Qingyu, Lu, Zhiyong

arXiv.org Artificial IntelligenceFeb-20-2024

Clinical calculators play a vital role in healthcare by offering accurate evidence-based predictions for various purposes such as prognosis. Nevertheless, their widespread utilization is frequently hindered by usability challenges, poor dissemination, and restricted functionality. Augmenting large language models with extensive collections of clinical calculators presents an opportunity to overcome these obstacles and improve workflow efficiency, but the scalability of the manual curation process poses a significant challenge. In response, we introduce AgentMD, a novel language agent capable of curating and applying clinical calculators across various clinical contexts. Using the published literature, AgentMD has automatically curated a collection of 2,164 diverse clinical calculators with executable functions and structured documentation, collectively named RiskCalcs. Manual evaluations show that RiskCalcs tools achieve an accuracy of over 80% on three quality metrics. At inference time, AgentMD can automatically select and apply the relevant RiskCalcs tools given any patient description. On the newly established RiskQA benchmark, AgentMD significantly outperforms chain-of-thought prompting with GPT-4 (87.7% vs. 40.9% in accuracy). Additionally, we also applied AgentMD to real-world clinical notes for analyzing both population-level and risk-level patient characteristics. In summary, our study illustrates the utility of language agents augmented with clinical calculators for healthcare analytics and patient care.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.13225

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.88)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports

Zhu, Qingqing, Chen, Xiuying, Jin, Qiao, Hou, Benjamin, Mathai, Tejas Sudharshan, Mukherjee, Pritam, Gao, Xin, Summers, Ronald M, Lu, Zhiyong

arXiv.org Artificial IntelligenceFeb-2-2024

In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4 1. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19, while our "Regressed GPT-4" model shows even greater alignment with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.16578

Country: North America > United States > Minnesota (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A scoping review on multimodal deep learning in biomedical images and texts

Sun, Zhaoyi, Lin, Mingquan, Zhu, Qingqing, Xie, Qianqian, Wang, Fei, Lu, Zhiyong, Peng, Yifan

arXiv.org Artificial IntelligenceOct-18-2023

Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.

artificial intelligence, biomedical image and text, machine learning, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.jbi.2023.104482

2307.07362

Genre:

Overview (0.73)
Research Report (0.69)

Industry: Health & Medicine > Diagnostic Medicine (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.53)

Add feedback