Goto

Collaborating Authors

 Hwang, Esther


Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia

arXiv.org Artificial Intelligence

This study presents the development and testing of a conversational speech system designed for robots to detect speech biomarkers indicative of cognitive impairments in people living with dementia (PLwD). The system integrates a backend Python WebSocket server and a central core module with a large language model (LLM) fine-tuned for dementia to process user input and generate robotic conversation responses in real-time in less than 1.5 seconds. The frontend user interface, a Progressive Web App (PWA), displays information and biomarker score graphs on a smartphone in real-time to human users (PLwD, caregivers, clinicians). Six speech biomarkers based on the existing literature - Altered Grammar, Pragmatic Impairments, Anomia, Disrupted Turn-Taking, Slurred Pronunciation, and Prosody Changes - were developed for the robot conversation system using two datasets, one that included conversations of PLwD with a human clinician (DementiaBank dataset) and one that included conversations of PLwD with a robot (Indiana dataset). We also created a composite speech biomarker that combined all six individual biomarkers into a single score. The speech system's performance was first evaluated on the DementiaBank dataset showing moderate correlation with MMSE scores, with the composite biomarker score outperforming individual biomarkers. Analysis of the Indiana dataset revealed higher and more variable biomarker scores, suggesting potential differences due to study populations (e.g. severity of dementia) and the conversational scenario (human-robot conversations are different from human-human). The findings underscore the need for further research on the impact of conversational scenarios on speech biomarkers and the potential clinical applications of robotic speech systems.


Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

arXiv.org Machine Learning

This paper makes several contributions. Among these, only 20-40% yield a diagnosis of cancer (5). The authors declare no conflict of interest. To whom correspondence should be addressed. Work done while visiting NYU. In the reader study, we compared the performance of our best model to that of radiologists and found our model to be as accurate as radiologists both in terms of area under ROC curve (AUC) and area under precision-recall curve (PRAUC). We also found that a hybrid model, taking the average of the probabilities of malignancy predicted by a radiologist and by our neural network, yields more accurate predictions than either of the two separately. This suggests that our network and radiologists learned different aspects of the task and that our model could be effective as a tool providing radiologists a second reader. With this contribution, research groups that are working on improving screening mammography, which may not have access to a large training dataset like ours, will be able to directly use our model in their research or to use our pretrained weights as an initialization to train models with less data. By making our models public, we invite other groups to validate our results and test their robustness to shifts in the data distribution. The dataset includes 229,426 digital screening mammography exams (1,001,093 images) from 141,473 patients. For each breast, we assign two binary labels: from biopsies. We have 5,832 exams with at least one biopsy the absence/presence of malignant findings in a breast, performed within 120 days of the screening mammogram. With Among these, biopsies confirmed malignant findings for 985 left and right breasts, each exam has a total of four binary (8.4%) breasts and benign findings for 5,556 (47.6%) breasts.