AITopics | Do, Thanh Son

Collaborating Authors

Do, Thanh Son

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Standardization of Clinical Notes using Large Language Models

Hier, Daniel B., Carrithers, Michael D., Do, Thanh Son, Obafemi-Ajayi, Tayo

arXiv.org Artificial IntelligenceDec-31-2024

Clinician notes are a rich source of patient information but often contain inconsistencies due to varied writing styles, colloquialisms, abbreviations, medical jargon, grammatical errors, and non-standard formatting. These inconsistencies hinder the extraction of meaningful data from electronic health records (EHRs), posing challenges for quality improvement, population health, precision medicine, decision support, and research. We present a large language model approach to standardizing a corpus of 1,618 clinical notes. Standardization corrected an average of $4.9 +/- 1.8$ grammatical errors, $3.3 +/- 5.2$ spelling errors, converted $3.1 +/- 3.0$ non-standard terms to standard terminology, and expanded $15.8 +/- 9.1$ abbreviations and acronyms per note. Additionally, notes were re-organized into canonical sections with standardized headings. This process prepared notes for key concept extraction, mapping to medical ontologies, and conversion to interoperable data formats such as FHIR. Expert review of randomly sampled notes found no significant data loss after standardization. This proof-of-concept study demonstrates that standardization of clinical notes can improve their readability, consistency, and usability, while also facilitating their conversion into interoperable data formats.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.00644

Country:

North America > United States > Missouri (0.14)
North America > United States > Illinois (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

When Less Is Not More: Large Language Models Normalize Less-Frequent Terms with Lower Accuracy

Hier, Daniel B., Do, Thanh Son, Obafemi-Ajayi, Tayo

arXiv.org Artificial IntelligenceSep-11-2024

Term normalization is the process of mapping a term from free text to a standardized concept and its machine-readable code in an ontology. Accurate normalization of terms that capture phenotypic differences between patients and diseases is critical to the success of precision medicine initiatives. A large language model (LLM), such as GPT-4o, can normalize terms to the Human Phenotype Ontology (HPO), but it may retrieve incorrect HPO IDs. Reported accuracy rates for LLMs on these tasks may be inflated due to imbalanced test datasets skewed towards high-frequency terms. In our study, using a comprehensive dataset of 268,776 phenotype annotations for 12,655 diseases from the HPO, GPT-4o achieved an accuracy of 13.1% in normalizing 11,225 unique terms. However, the accuracy was unevenly distributed, with higher-frequency and shorter terms normalized more accurately than lower-frequency and longer terms. Feature importance analysis, using SHAP and permutation methods, identified low-term frequency as the most significant predictor of normalization errors. These findings suggest that training and evaluation datasets for LLM-based term normalization should balance low- and high-frequency terms to improve model performance, particularly for infrequent terms critical to precision medicine.

frequency, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.13746

Country: North America > United States > Missouri (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

A Simplified Retriever to Improve Accuracy of Phenotype Normalizations by Large Language Models

Hier, Daniel B., Do, Thanh Son, Obafemi-Ajayi, Tayo

arXiv.org Artificial IntelligenceSep-10-2024

Large language models (LLMs) have shown improved accuracy in phenotype term normalization tasks when augmented with retrievers that suggest candidate normalizations based on term definitions. In this work, we introduce a simplified retriever that enhances LLM accuracy by searching the Human Phenotype Ontology (HPO) for candidate matches using contextual word embeddings from BioBERT without the need for explicit term definitions. Testing this method on terms derived from the clinical synopses of Online Mendelian Inheritance in Man (OMIM), we demonstrate that the normalization accuracy of a state-of-the-art LLM increases from a baseline of 62.3% without augmentation to 90.3% with retriever augmentation. This approach is potentially generalizable to other biomedical term normalization tasks and offers an efficient alternative to more complex retrieval methods.

large language model, machine learning, normalization, (18 more...)

arXiv.org Artificial Intelligence

2409.13744

Country:

North America > United States > Illinois (0.14)
North America > United States > Missouri (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback