AITopics | Singh, Preeti

Collaborating Authors

Singh, Preeti

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models

Pfohl, Stephen R., Cole-Lewis, Heather, Sayres, Rory, Neal, Darlene, Asiedu, Mercy, Dieng, Awa, Tomasev, Nenad, Rashid, Qazi Mamunur, Azizi, Shekoofeh, Rostamzadeh, Negar, McCoy, Liam G., Celi, Leo Anthony, Liu, Yun, Schaekermann, Mike, Walton, Alanna, Parrish, Alicia, Nagpal, Chirag, Singh, Preeti, Dewitt, Akeiylah, Mansfield, Philip, Prakash, Sushant, Heller, Katherine, Karthikesalingam, Alan, Semturs, Christopher, Barral, Joelle, Corrado, Greg, Matias, Yossi, Smith-Loud, Jamila, Horn, Ivor, Singhal, Karan

arXiv.org Artificial IntelligenceMar-18-2024

Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. In this work, we present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and then conduct an empirical case study with Med-PaLM 2, resulting in the largest human evaluation study in this area to date. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven newly-released datasets comprising both manually-curated and LLM-generated questions enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of possible biases in Med-PaLM 2 answers to adversarial queries. Through our empirical study, we find that the use of a collection of datasets curated through a variety of methodologies, coupled with a thorough evaluation protocol that leverages multiple assessment rubric designs and diverse rater groups, surfaces biases that may be missed via narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. We emphasize that while our framework can identify specific forms of bias, it is not sufficient to holistically assess whether the deployment of an AI system promotes equitable health outcomes. We hope the broader community leverages and builds on these tools and methods towards realizing a shared goal of LLMs that promote accessible and equitable healthcare for all.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.12025

Country:

North America > United States (1.00)
Asia (0.67)
Africa (0.67)
North America > Canada > Alberta (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (0.92)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Consumer Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings

Rikhye, Rajeev V., Loh, Aaron, Hong, Grace Eunhae, Singh, Preeti, Smith, Margaret Ann, Muralidharan, Vijaytha, Wong, Doris, Sayres, Rory, Phung, Michelle, Betancourt, Nicolas, Fong, Bradley, Sahasrabudhe, Rachna, Nasim, Khoban, Eschholz, Alec, Mustafa, Basil, Freyberg, Jan, Spitz, Terry, Matias, Yossi, Corrado, Greg S., Chou, Katherine, Webster, Dale R., Bui, Peggy, Liu, Yuan, Liu, Yun, Ko, Justin, Lin, Steven

arXiv.org Artificial IntelligenceFeb-23-2024

Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generalizable AI that can aid in the diagnosis of skin conditions across a variety of clinical settings. In this retrospective study, we demonstrate that differences in skin condition distribution, rather than in demographics or image capture mode are the main source of errors when an AI algorithm is evaluated on data from a previously unseen source. We demonstrate a series of steps to close this generalization gap, requiring progressively more information about the new source, ranging from the condition distribution to training data enriched for data less frequently seen during training. Our results also suggest comparable performance from end-to-end fine tuning versus fine tuning solely the classification layer on top of a frozen embedding model. Our approach can inform the adaptation of AI algorithms to new settings, based on the information and resources available.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.15566

Country: North America > United States > California > Santa Clara County (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Dermatology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Discovering novel systemic biomarkers in photos of the external eye

Babenko, Boris, Traynis, Ilana, Chen, Christina, Singh, Preeti, Uddin, Akib, Cuadros, Jorge, Daskivich, Lauren P., Maa, April Y., Kim, Ramasamy, Kang, Eugene Yu-Chuan, Matias, Yossi, Corrado, Greg S., Peng, Lily, Webster, Dale R., Semturs, Christopher, Krause, Jonathan, Varadarajan, Avinash V., Hammel, Naama, Liu, Yun

arXiv.org Artificial IntelligenceJul-18-2022

External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidney (eGFR estimated using the race-free 2021 CKD-EPI creatinine equation, the urine ACR); bone & mineral (calcium); thyroid (TSH); and blood count (Hgb, WBC, platelets). Development leveraged 151,237 images from 49,015 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA. Evaluation focused on 9 pre-specified systemic parameters and leveraged 3 validation sets (A, B, C) spanning 28,869 patients with and without diabetes undergoing eye screening in 3 independent sites in Los Angeles County, CA, and the greater Atlanta area, GA. We compared against baseline models incorporating available clinicodemographic variables (e.g. age, sex, race/ethnicity, years with diabetes). Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST>36, calcium<8.6, eGFR<60, Hgb<11, platelets<150, ACR>=300, and WBC<4 on validation set A (a patient population similar to the development sets), where the AUC of DLS exceeded that of the baseline by 5.2-19.4%. On validation sets B and C, with substantial patient population differences compared to the development sets, the DLS outperformed the baseline for ACR>=300 and Hgb<11 by 7.3-13.2%. Our findings provide further evidence that external eye photos contain important biomarkers of systemic health spanning multiple organ systems. Further work is needed to investigate whether and how these biomarkers can be translated into clinical impact.

artificial intelligence, machine learning, validation, (20 more...)

arXiv.org Artificial Intelligence

2207.08998

Country: North America > United States > California > Los Angeles County (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback