AITopics | Liu, WeiZhi

Collaborating Authors

Liu, WeiZhi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluation of Bias Towards Medical Professionals in Large Language Models

Chen, Xi, Xu, Yang, You, MingKe, Wang, Li, Liu, WeiZhi, Li, Jian

arXiv.org Artificial IntelligenceJun-30-2024

This study evaluates whether large language models (LLMs) exhibit biases towards medical professionals. Fictitious candidate resumes were created to control for identity factors while maintaining consistent qualifications. Three LLMs (GPT-4, Claude-3-haiku, and Mistral-Large) were tested using a standardized prompt to evaluate resumes for specific residency programs. Explicit bias was tested by changing gender and race information, while implicit bias was tested by changing names while hiding race and gender. Physician data from the Association of American Medical Colleges was used to compare with real-world demographics. 900,000 resumes were evaluated. All LLMs exhibited significant gender and racial biases across medical specialties. Gender preferences varied, favoring male candidates in surgery and orthopedics, while preferring females in dermatology, family medicine, obstetrics and gynecology, pediatrics, and psychiatry. Claude-3 and Mistral-Large generally favored Asian candidates, while GPT-4 preferred Black and Hispanic candidates in several specialties. Tests revealed strong preferences towards Hispanic females and Asian males in various specialties. Compared to real-world data, LLMs consistently chose higher proportions of female and underrepresented racial candidates than their actual representation in the medical workforce. GPT-4, Claude-3, and Mistral-Large showed significant gender and racial biases when evaluating medical professionals for residency selection. These findings highlight the potential for LLMs to perpetuate biases and compromise healthcare workforce diversity if used without proper bias mitigation strategies.

large language model, machine learning, medicine, (20 more...)

arXiv.org Artificial Intelligence

2407.12031

Country: Asia > China > Sichuan Province (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Orthopedics/Orthopedic Surgery (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA

Chen, Xi, You, MingKe, Wang, Li, Liu, WeiZhi, Fu, Yu, Xu, Jie, Zhang, Shaoting, Chen, Gang, Li, Kang, Li, Jian

arXiv.org Artificial IntelligenceJan-19-2024

The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. This study focused on evaluating and enhancing the clinical capabilities of LLMs in specific domains, using osteoarthritis (OA) management as a case study. A domain specific benchmark framework was developed, which evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM tailored for OA management that integrates retrieval-augmented generation (RAG) and instruction prompts, was developed. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results showed that general LLMs like GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. This study introduces a novel benchmark framework which assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2401.12998

Country:

North America > United States (0.46)
Asia > China > Sichuan Province (0.15)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback