Goto

Collaborating Authors

 Ramanarayanan, Vikram


What Do Patients Say About Their Disease Symptoms? Deep Multilabel Text Classification With Human-in-the-Loop Curation for Automatic Labeling of Patient Self Reports of Problems

arXiv.org Artificial Intelligence

The USA Food and Drug Administration has accorded increasing importance to patient-reported problems in clinical and research settings. In this paper, we explore one of the largest online datasets comprising 170,141 open-ended self-reported responses (called "verbatims") from patients with Parkinson's (PwPs) to questions about what bothers them about their Parkinson's Disease and how it affects their daily functioning, also known as the Parkinson's Disease Patient Report of Problems. Classifying such verbatims into multiple clinically relevant symptom categories is an important problem and requires multiple steps - expert curation, a multi-label text classification (MLTC) approach and large amounts of labelled training data. Further, human annotation of such large datasets is tedious and expensive. We present a novel solution to this problem where we build a baseline dataset using 2,341 (of the 170,141) verbatims annotated by nine curators including clinical experts and PwPs. We develop a rules based linguistic-dictionary using NLP techniques and graph database-based expert phrase-query system to scale the annotation to the remaining cohort generating the machine annotated dataset, and finally build a Keras-Tensorflow based MLTC model for both datasets. The machine annotated model significantly outperforms the baseline model with a F1-score of 95% across 65 symptom categories on a held-out test set.


Crowdsourcing Multimodal Dialog Interactions: Lessons Learned from the HALEF Case

AAAI Conferences

The advent of multiple study on crowdsourcing for speech applications concluded crowdsourcing vendors and software infrastructure has that "although the crowd sometimes approached the level greatly helped this effort. Several providers also offer integrated of the experts, it never surpassed it" (Parent and Eskenazi filtering tools that allow users to customize different 2011)). This is exacerbated during multimodal dialog data aspects of their data collection, including target population, collections, where it becomes harder to quality-control for geographical location, demographics and sometimes usable audio-video data, due to a variety of factors including even education level and expertise. Managed crowdsourcing poor visual quality caused by variable lighting, position, providers extend these options by offering further customization or occlusions, participant or administrator error, or technical and end-to-end management of the entire data issues with the system or network (McDuff, Kaliouby, and collection operation.