AITopics | Liu, Nan

Collaborating Authors

Liu, Nan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations

Yang, Rui, Tong, Jiayi, Wang, Haoyuan, Huang, Hui, Hu, Ziyang, Li, Peiyu, Liu, Nan, Lindsell, Christopher J., Pencina, Michael J., Chen, Yong, Hong, Chuan

arXiv.org Artificial IntelligenceMar-19-2025

Background. Systematic reviews in comparative effectiveness research require timely evidence synthesis. Preprints accelerate knowledge dissemination but vary in quality, posing challenges for systematic reviews. Methods. We propose AutoConfidence (automated confidence assessment), an advanced framework for predicting preprint publication, which reduces reliance on manual curation and expands the range of predictors, including three key advancements: (1) automated data extraction using natural language processing techniques, (2) semantic embeddings of titles and abstracts, and (3) large language model (LLM)-driven evaluation scores. Additionally, we employed two prediction models: a random forest classifier for binary outcome and a survival cure model that predicts both binary outcome and publication risk over time. Results. The random forest classifier achieved AUROC 0.692 with LLM-driven scores, improving to 0.733 with semantic embeddings and 0.747 with article usage metrics. The survival cure model reached AUROC 0.716 with LLM-driven scores, improving to 0.731 with semantic embeddings. For publication risk prediction, it achieved a concordance index of 0.658, increasing to 0.667 with semantic embeddings. Conclusion. Our study advances the framework for preprint publication prediction through automated data extraction and multiple feature integration. By combining semantic embeddings with LLM-driven evaluations, AutoConfidence enhances predictive performance while reducing manual annotation burden. The framework has the potential to facilitate systematic incorporation of preprint articles in evidence-based medicine, supporting researchers in more effective evaluation and utilization of preprint resources.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2503.13857

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

Xuan, Weihao, Yang, Rui, Qi, Heli, Zeng, Qingcheng, Xiao, Yunze, Xing, Yun, Wang, Junjue, Li, Huitao, Li, Xin, Yu, Kunyu, Liu, Nan, Chen, Qingyu, Teodoro, Douglas, Marrese-Taylor, Edison, Lu, Shijian, Iwasawa, Yusuke, Matsuo, Yutaka, Li, Irene

arXiv.org Artificial IntelligenceMar-13-2025

Traditional benchmarks struggle to evaluate increasingly sophisticated language models in multilingual and culturally diverse contexts. To address this gap, we introduce MMLU-ProX, a comprehensive multilingual benchmark covering 13 typologically diverse languages with approximately 11,829 questions per language. Building on the challenging reasoning-focused design of MMLU-Pro, our framework employs a semi-automatic translation process: translations generated by state-of-the-art large language models (LLMs) are rigorously evaluated by expert annotators to ensure conceptual accuracy, terminological consistency, and cultural relevance. We comprehensively evaluate 25 state-of-the-art LLMs using 5-shot chain-of-thought (CoT) and zero-shot prompting strategies, analyzing their performance across linguistic and cultural boundaries. Our experiments reveal consistent performance degradation from high-resource languages to lower-resource ones, with the best models achieving over 70% accuracy on English but dropping to around 40% for languages like Swahili, highlighting persistent gaps in multilingual capabilities despite recent advances. MMLU-ProX is an ongoing project; we are expanding our benchmark by incorporating additional languages and evaluating more language models to provide a more comprehensive assessment of multilingual capabilities.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.10497

Country:

Asia > Japan (0.14)
Asia > Thailand (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Regulatory Science Innovation for Generative AI and Large Language Models in Health and Medicine: A Global Call for Action

Ong, Jasmine Chiat Ling, Ning, Yilin, Liu, Mingxuan, Ma, Yian, Liang, Zhao, Singh, Kuldev, Chang, Robert T, Vogel, Silke, Lim, John CW, Tan, Iris Siu Kwan, Freyer, Oscar, Gilbert, Stephen, Bitterman, Danielle S, Liu, Xiaoxuan, Denniston, Alastair K, Liu, Nan

arXiv.org Artificial IntelligenceJan-27-2025

The integration of generative AI (GenAI) and large language models (LLMs) in healthcare presents both unprecedented opportunities and challenges, necessitating innovative regulatory approaches. GenAI and LLMs offer broad applications, from automating clinical workflows to personalizing diagnostics. However, the non-deterministic outputs, broad functionalities and complex integration of GenAI and LLMs challenge existing medical device regulatory frameworks, including the total product life cycle (TPLC) approach. Here we discuss the constraints of the TPLC approach to GenAI and LLM-based medical device regulation, and advocate for global collaboration in regulatory science research. This serves as the foundation for developing innovative approaches including adaptive policies and regulatory sandboxes, to test and refine governance in real-world settings. International harmonization, as seen with the International Medical Device Regulators Forum, is essential to manage implications of LLM on global health, including risks of widening health inequities driven by inherent model biases. By engaging multidisciplinary expertise, prioritizing iterative, data-driven approaches, and focusing on the needs of diverse populations, global regulatory science research enables the responsible and equitable advancement of LLM innovations in healthcare.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.07794

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)

Add feedback

Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine

Ke, Yu He, Jin, Liyuan, Elangovan, Kabilan, Ong, Bryan Wen Xi, Oh, Chin Yang, Sim, Jacqueline, Loh, Kenny Wei-Tsen, Soh, Chai Rick, Cheng, Jonathan Ming Hua, Lee, Aaron Kwang Yang, Ting, Daniel Shu Wei, Liu, Nan, Abdullah, Hairil Rizal

arXiv.org Artificial IntelligenceDec-23-2024

Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.18096

Country: Asia > Singapore (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Surgery (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Ke, Yu He, Jin, Liyuan, Elangovan, Kabilan, Abdullah, Hairil Rizal, Liu, Nan, Sia, Alex Tiong Heng, Soh, Chai Rick, Tung, Joshua Yi Min, Ong, Jasmine Chiat Ling, Kuo, Chang-Fu, Wu, Shao-Chun, Kovacheva, Vesela P., Ting, Daniel Shu Wei

arXiv.org Artificial IntelligenceOct-10-2024

Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated answers serving as comparisons. The LLM-RAG models generated responses within 20 seconds, significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no hallucinations and producing correct instructions comparable to clinicians. Results were consistent across both local and international guidelines. This study demonstrates the potential of LLM-RAG models for preoperative healthcare tasks, highlighting their efficiency, scalability, and reliability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.08431

Country:

Asia > Singapore (0.16)
Asia > Taiwan (0.14)
Europe > United Kingdom (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Surgery (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FairFML: Fair Federated Machine Learning with a Case Study on Reducing Gender Disparities in Cardiac Arrest Outcome Prediction

Li, Siqi, Wu, Qiming, Li, Xin, Miao, Di, Hong, Chuan, Gu, Wenjun, Shang, Yuqing, Okada, Yohei, Chen, Michael Hao, Yan, Mengying, Ning, Yilin, Ong, Marcus Eng Hock, Liu, Nan

arXiv.org Artificial IntelligenceOct-7-2024

Objective: Mitigating algorithmic disparities is a critical challenge in healthcare research, where ensuring equity and fairness is paramount. While large-scale healthcare data exist across multiple institutions, cross-institutional collaborations often face privacy constraints, highlighting the need for privacy-preserving solutions that also promote fairness. Materials and Methods: In this study, we present Fair Federated Machine Learning (FairFML), a model-agnostic solution designed to reduce algorithmic bias in cross-institutional healthcare collaborations while preserving patient privacy. As a proof of concept, we validated FairFML using a real-world clinical case study focused on reducing gender disparities in cardiac arrest outcome prediction. Results: We demonstrate that the proposed FairFML framework enhances fairness in federated learning (FL) models without compromising predictive performance. Our findings show that FairFML improves model fairness by up to 65% compared to the centralized model, while maintaining performance comparable to both local and centralized models, as measured by receiver operating characteristic analysis. Discussion and Conclusion: FairFML offers a promising and flexible solution for FL collaborations, with its adaptability allowing seamless integration with various FL frameworks and models, from traditional statistical methods to deep learning techniques. This makes FairFML a robust approach for developing fairer FL models across diverse clinical and biomedical applications.

data mining, fairfml, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.17269

Country:

Asia (0.48)
North America > United States (0.28)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

Tan, Ting Fang, Elangovan, Kabilan, Ong, Jasmine, Shah, Nigam, Sung, Joseph, Wong, Tien Yin, Xue, Lan, Liu, Nan, Wang, Haibo, Kuo, Chang Fu, Chesterman, Simon, Yeong, Zee Kin, Ting, Daniel SW

arXiv.org Artificial IntelligenceJul-10-2024

A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.07666

Country:

Asia (0.74)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Health Care Providers & Services (0.48)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis

Li, Siqi, Li, Xin, Yu, Kunyu, Miao, Di, Zhu, Mingcheng, Yan, Mengying, Ke, Yuhe, D'Agostino, Danny, Ning, Yilin, Wu, Qiming, Wang, Ziwen, Shang, Yuqing, Liu, Molei, Hong, Chuan, Liu, Nan

arXiv.org Artificial IntelligenceJul-4-2024

Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pre-trained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. In our review of TL applications in structured clinical and biomedical data, we screened 3,515 papers, with 55 meeting the inclusion criteria. Among these, only 2% (one out of 55) utilized external studies, and 7% (four out of 55) addressed scenarios involving multi-site collaborations with privacy constraints. To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines.

artificial intelligence, machine learning, source data, (15 more...)

arXiv.org Artificial Intelligence

2407.11034

Country:

Asia (1.00)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Lightweight Large Language Model for Medication Enquiry: Med-Pal

Elangovan, Kabilan, Ong, Jasmine Chiat Ling, Jin, Liyuan, Seng, Benjamin Jun Jie, Kwan, Yu Heng, Tan, Lit Soo, Zhong, Ryan Jian, Ma, Justina Koi Li, Ke, YuHe, Liu, Nan, Giacomini, Kathleen M, Ting, Daniel Shu Wei

arXiv.org Artificial IntelligenceJul-1-2024

Large Language Models (LLMs) have emerged as a potential solution to assist digital health development with patient education, commonly medication-related enquires. We trained and validated Med-Pal, a medication domain-specific LLM-chatbot fine-tuned with a fine-grained and expert curated dataset from a selection of five light-weighted open-source LLMs of smaller parameter size (7 billion or less) regarding computational constraints and prioritizing operational efficiency. A multi-disciplinary team performed a clinical evaluation of LLMs responses using the SCORE criteria, focusing on safety, accuracy, bias, reproducibility, and ease of understanding. Best performing light-weighted LLM was chosen as Med-Pal for further engineering with guard-railing using adversarial prompting. Med-Pal and existing light-weighted LLMs, including pretrained Biomistral and finetuned Meerkat, were validated on an independent dataset on a broad range of medication-related questions (231 in total), 12 different question types across 14 different medication classes. Mistral-7b emerged as the top performer among selected lightweight LLMs, achieving the highest median score of 14 and 71.9% high-quality responses in accuracy and safety domains, hence chosen as the backbone LLM for Med-Pal. When compared against Biomistral, Med-pal outperformed in generating responses appropriate for patient communication, with significant reductions bias and errors typical of general LLMs. Comparable performance was observed when comparing Med-Pal with Meerkat. Med-Pal showcases the feasibility of developing and employing fine-tuned light-weighted LLMs to enhance digital health communications.

large language model, med-pal, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.12822

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Materials > Chemicals (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(10 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Retrieval-Augmented Generation for Generative Artificial Intelligence in Medicine

Yang, Rui, Ning, Yilin, Keppo, Emilia, Liu, Mingxuan, Hong, Chuan, Bitterman, Danielle S, Ong, Jasmine Chiat Ling, Ting, Daniel Shu Wei, Liu, Nan

arXiv.org Artificial IntelligenceJun-18-2024

Generative artificial intelligence (AI) has brought revolutionary innovations in various fields, including medicine. However, it also exhibits limitations. In response, retrieval-augmented generation (RAG) provides a potential solution, enabling models to generate more accurate contents by leveraging the retrieval of external knowledge. With the rapid advancement of generative AI, RAG can pave the way for connecting this transformative technology with medical applications and is expected to bring innovations in equity, reliability, and personalization to health care.

information, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2406.12449

Country:

North America > United States (0.69)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback