AITopics | Leroy, Gondy

Collaborating Authors

Leroy, Gondy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ICD Codes are Insufficient to Create Datasets for Machine Learning: An Evaluation Using All of Us Data for Coccidioidomycosis and Myocardial Infarction

Whitlock, Abigail E., Leroy, Gondy, Donovan, Fariba M., Galgiani, John N.

arXiv.org Artificial IntelligenceJul-10-2024

In medicine, machine learning (ML) datasets are often built using the International Classification of Diseases (ICD) codes. As new models are being developed, there is a need for larger datasets. However, ICD codes are intended for billing. We aim to determine how suitable ICD codes are for creating datasets to train ML models. We focused on a rare and common disease using the All of Us database. First, we compared the patient cohort created using ICD codes for Valley fever (coccidioidomycosis, CM) with that identified via serological confirmation. Second, we compared two similarly created patient cohorts for myocardial infarction (MI) patients. We identified significant discrepancies between these two groups, and the patient overlap was small. The CM cohort had 811 patients in the ICD-10 group, 619 patients in the positive-serology group, and 24 with both. The MI cohort had 14,875 patients in the ICD-10 group, 23,598 in the MI laboratory-confirmed group, and 6,531 in both. Demographics, rates of disease symptoms, and other clinical data varied across our case study cohorts.

artificial intelligence, cohort, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.07997

Country: North America > United States > Arizona (0.16)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks

Woolsey, Chancellor R., Bisht, Prakash, Rothman, Joshua, Leroy, Gondy

arXiv.org Artificial IntelligenceMay-7-2024

An important issue impacting healthcare is a lack of available experts. Machine learning (ML) models could resolve this by aiding in diagnosing patients. However, creating datasets large enough to train these models is expensive. We evaluated large language models (LLMs) for data creation. Using Autism Spectrum Disorders (ASD), we prompted ChatGPT and GPT-Premium to generate 4,200 synthetic observations to augment existing medical data. Our goal is to label behaviors corresponding to autism criteria and improve model accuracy with synthetic training data. We used a BERT classifier pre-trained on biomedical literature to assess differences in performance between models. A random sample (N=140) from the LLM-generated data was evaluated by a clinician and found to contain 83% correct example-label pairs. Augmenting data increased recall by 13% but decreased precision by 16%, correlating with higher quality and lower accuracy across pairs. Future work will analyze how different synthetic data traits affect ML outcomes.

gpt-3, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2405.06695

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)

Add feedback

Text and Audio Simplification: Human vs. ChatGPT

Leroy, Gondy, Kauchak, David, Harber, Philip, Pal, Ankit, Shukla, Akash

arXiv.org Artificial IntelligenceApr-29-2024

Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora. We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated these texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.01592

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Effects of Added Emphasis and Pause in Audio Delivery of Health Information

Ahmed, Arif, Leroy, Gondy, Rains, Stephen A., Harber, Philip, Kauchak, David, Barai, Prosanta

arXiv.org Artificial IntelligenceApr-29-2024

Health literacy is crucial to supporting good health and is a major national goal. Audio delivery of information is becoming more popular for informing oneself. In this study, we evaluate the effect of audio enhancements in the form of information emphasis and pauses with health texts of varying difficulty and we measure health information comprehension and retention. We produced audio snippets from difficult and easy text and conducted the study on Amazon Mechanical Turk (AMT). Our findings suggest that emphasis matters for both information comprehension and retention. When there is no added pause, emphasizing significant information can lower the perceived difficulty for difficult and easy texts. Comprehension is higher (54%) with correctly placed emphasis for the difficult texts compared to not adding emphasis (50%). Adding a pause lowers perceived difficulty and can improve retention but adversely affects information comprehension.

bioinformatics, information, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2404.19119

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Biomedical Informatics > Clinical Informatics (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)
(2 more...)

Add feedback

APPLS: A Meta-evaluation Testbed for Plain Language Summarization

Guo, Yue, August, Tal, Leroy, Gondy, Cohen, Trevor, Wang, Lucy Lu

arXiv.org Artificial IntelligenceMay-23-2023

While there has been significant development of models for Plain Language Summarization (PLS), evaluation remains a challenge. This is in part because PLS involves multiple, interrelated language transformations (e.g., adding background explanations, removing specialized terminology). No metrics are explicitly engineered for PLS, and the suitability of other text generation evaluation metrics remains unclear. To address these concerns, our study presents a granular meta-evaluation testbed, APPLS, designed to evaluate existing metrics for PLS. Drawing on insights from previous research, we define controlled perturbations for our testbed along four criteria that a metric of plain language should capture: informativeness, simplification, coherence, and faithfulness. Our analysis of metrics using this testbed reveals that current metrics fail to capture simplification, signaling a crucial gap. In response, we introduce POMME, a novel metric designed to assess text simplification in PLS. We demonstrate its correlation with simplification perturbations and validate across a variety of datasets. Our research contributes the first meta-evaluation testbed for PLS and a comprehensive evaluation of existing metrics, offering insights with relevance to other text generation tasks.

machine learning, natural language, simplification, (17 more...)

arXiv.org Artificial Intelligence

2305.14341

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Guo, Yue, Qiu, Wei, Leroy, Gondy, Wang, Sheng, Cohen, Trevor

arXiv.org Artificial IntelligenceNov-7-2022

Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract and the corresponding lay language summary are written by domain experts, assuring the quality of our dataset. Furthermore, qualitative evaluation of expert-authored plain language summaries has revealed background explanation as a key strategy to increase accessibility. Such explanation is challenging for neural models to generate because it goes beyond simplification by adding content absent from the source. We derive two specialized paired corpora from CELLS to address key challenges in lay language generation: generating background explanations and simplifying the original abstract. We adopt retrieval-augmented models as an intuitive fit for the task of background explanation generation, and show improvements in summary quality and simplicity while maintaining factual correctness. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. CELLS is publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval.

explanation, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.03818

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)

Add feedback