Performance Analysis
An Empirical Evaluation of the Effect of Adversarial Labels on Classifier Accuracy Estimation
Clifford, Alexandra (MIT Lincoln Laboratory) | Corey, Cassian (MIT Lincoln Laboratory) | Holodnak, John T. (MIT Lincoln Laboratory)
This paper examines the effect of providing adversarial labels to several algorithms that use noisy labels from multiple experts to estimate classifier accuracy, referred to hereafter as "estimators." We propose four adversary labeling strategies and use experiments on synthetic data to gauge their impact on the estimators. Our results show that even a single adversary can considerably impact the effectiveness of an estimator. In addition, we find that estimators that weight the input of all experts equally tend to be much more affected by the inclusion of adversaries than those that can separately model each expert and that the impact of adversaries is lessened when the experts have higher accuracy.
A Comparison of Three Recommender Strategies for Facilitating Person-Centered Care in Nursing Homes
Martindale, Nathan (Tennessee Technological University) | Gannod, Gerald C. (Tennessee Technological University) | Abbott, Katherine M. (Miami University) | Haitsma, Kimberly Van (Pennsylvania State University)
The Preferences for Everyday Living Inventory (PELI) is a 72-question instrument used for helping nursing homes assess person-centered care. In particular, the approach allows residents to express their preferences for both care and activities in order to provide direct care workers with insights on how to best provide a high-quality living experience. Among the challenges of using the PELI is its length: 72 questions give rise to issues of survey fatigue while also creating a workflow bottleneck for those providing care. In this paper we explore and evaluate the use of three different recommender strategies that we have applied to the PELI. In particular, we present the use of both rule-based and neighborhood-based collaborative filtering in order to make recommendations on which preference questions to present to a resident. We illustrate the approaches by providing a domain-specific example, and then compare the approaches across a number of performance and quality metrics.
Classification of Spontaneous Speech of Individuals with Dementia Based on Automatic Prosody Analysis Using Support Vector Machines (SVM)
Ossewaarde, Roelant (Rijksuniversiteit Groningen) | Jonkers, Roel (Rijksuniversiteit Groningen) | Jalvingh, Fedor (Rijksuniversiteit Groningen) | Bastiaanse, Roelien (Rijksuniversiteit Groningen)
Analysis of spontaneous speech is an important tool for clinical linguists to diagnose various dementia types that affect the language processing areas. Prosody is affected by some dementia types, most notably Parkinson's disease (PD, degradation of voice quality, unstable pitch), Alzheimer's disease (AD, monotonic pitch), and the non-fluent type of Primary Progressive Aphasia (PPA-NF, hesitant, non-fluent speech). Prosodic features can be computed efficiently by software. In this study, we evaluate the performance of a SVM classifier that is trained on prosodic features only. The limitation to only prosody yields baseline results that can be used in a later stage to evaluate the added effect of variables of (morpho) syntax. The goal is to distinguish different dementia types based on the recorded speech. Results show that the classifier can distinguish some dementia types (PPA-NF, AD), but not others (PD, PPA-SD).
Vehicle Shape and Color Classification Using Convolutional Neural Network
Nafzi, Mohamed, Brauckmann, Michael, Glasmachers, Tobias
This paper presents a module of vehicle reidentification based on make/model and color classification. It could be used by the Automated Vehicular Surveillance (AVS) or by the fast analysis of video data. Many of problems, that are related to this topic, had to be addressed. In order to facilitate and accelerate the progress in this subject, we will present our way to collect and to label a large scale data set. We used deeper neural networks in our training. They showed a good classification accuracy. We show the results of make/model and color classification on controlled and video data set. We demonstrate with the help of a developed application the re-identification of vehicles on video images based on make/model and color classification. This work was partially funded under the grant.
Evaluation of Machine Learning Algorithms for Intrusion Detection System
To gauge the accuracy of machine learning models we use various parameters. The metrics used here will be Average Accuracy, False Positive Rates and False Negative Rates. K-Means is excluded from this metric as it is an unsupervised algorithm. Average Accuracy is defined as the ratio of the correctly classified data points to the total number of data points. False Positives are those cases which were supposed to be returned as threats but aren't. False negatives are just the opposite.
Approximating the Ideal Observer and Hotelling Observer for binary signal detection tasks by use of supervised learning methods
Zhou, Weimin, Li, Hua, Anastasio, Mark A.
It is widely accepted that optimization of medical imaging system performance should be guided by task-based measures of image quality (IQ). Task-based measures of IQ quantify the ability of an observer to perform a specific task such as detection or estimation of a signal (e.g., a tumor). For binary signal detection tasks, the Bayesian Ideal Observer (IO) sets an upper limit of observer performance and has been advocated for use in optimizing medical imaging systems and data-acquisition designs. Except in special cases, determination of the IO test statistic is analytically intractable. Markov-chain Monte Carlo (MCMC) techniques can be employed to approximate IO detection performance, but their reported applications have been limited to relatively simple object models. In cases where the IO test statistic is difficult to compute, the Hotelling Observer (HO) can be employed. To compute the HO test statistic, potentially large covariance matrices must be accurately estimated and subsequently inverted, which can present computational challenges. This work investigates supervised learning-based methodologies for approximating the IO and HO test statistics. Convolutional neural networks (CNNs) and single-layer neural networks (SLNNs) are employed to approximate the IO and HO test statistics, respectively. Numerical simulations were conducted for both signal-known-exactly (SKE) and signal-known-statistically (SKS) signal detection tasks. The performances of the supervised learning methods are assessed via receiver operating characteristic (ROC) analysis and the results are compared to those produced by use of traditional numerical methods or analytical calculations when feasible. The potential advantages of the proposed supervised learning approaches for approximating the IO and HO test statistics are discussed.
Revisiting Precision and Recall Definition for Generative Model Evaluation
Simon, Loรฏc, Webster, Ryan, Rabin, Julien
In this article we revisit the definition of Precision-Recall (PR) curves for generative models proposed by Sajjadi et al. (arXiv:1806.00035). Rather than providing a scalar for generative quality, PR curves distinguish mode-collapse (poor recall) and bad quality (poor precision). We first generalize their formulation to arbitrary measures, hence removing any restriction to finite support. We also expose a bridge between PR curves and type I and type II error rates of likelihood ratio classifiers on the task of discriminating between samples of the two distributions. Building upon this new perspective, we propose a novel algorithm to approximate precision-recall curves, that shares some interesting methodological properties with the hypothesis testing technique from Lopez-Paz et al (arXiv:1610.06545). We demonstrate the interest of the proposed formulation over the original approach on controlled multi-modal datasets.
What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use
Tonekaboni, Sana, Joshi, Shalmali, McCradden, Melissa D, Goldenberg, Anna
Translating machine learning (ML) models effectively to clinical practice requires establishing clinicians' trust. Explainability, or the ability of an ML model to justify its outcomes and assist clinicians in rationalizing the model prediction, has been generally understood to be critical to establishing trust. However, the field suffers from the lack of concrete definitions for usable explanations in different settings. To identify specific aspects of explainability that may catalyze building trust in ML models, we surveyed clinicians from two distinct acute care specialties (Intenstive Care Unit and Emergency Department). We use their feedback to characterize when explainability helps to improve clinicians' trust in ML models. We further identify the classes of explanations that clinicians identified as most relevant and crucial for effective translation to clinical practice. Finally, we discern concrete metrics for rigorous evaluation of clinical explainability methods. By integrating perceptions of explainability between clinicians and ML researchers we hope to facilitate the endorsement and broader adoption and sustained use of ML systems in healthcare.
Classification of Perceived Human Stress using Physiological Signals
Arsalan, Aamir, Majid, Muhammad, Anwar, Syed Muhammad, Bagci, Ulas
In this paper, we present an experimental study for the classification of perceived human stress using non-invasive physiological signals. These include electroencephalography (EEG), galvanic skin response (GSR), and photoplethysmography (PPG). We conducted experiments consisting of steps including data acquisition, feature extraction, and perceived human stress classification. The physiological data of $28$ participants are acquired in an open eye condition for a duration of three minutes. Four different features are extracted in time domain from EEG, GSR and PPG signals and classification is performed using multiple classifiers including support vector machine, the Naive Bayes, and multi-layer perceptron (MLP). The best classification accuracy of 75% is achieved by using MLP classifier. Our experimental results have shown that our proposed scheme outperforms existing perceived stress classification methods, where no stress inducers are used.
Physically-interpretable classification of network dynamics for complex collective motions
Fujii, Keisuke, Takeishi, Naoya, Hojo, Motokazu, Inaba, Yuki, Kawahara, Yoshinobu
Understanding complex network dynamics is a fundamental issue in various scientific and engineering fields. Network theory is capable of revealing the relationship between elements and their propagation; however, for complex collective motions, the network properties often transiently and complexly change. A fundamental question addressed here pertains to the classification of collective motion network based on physically-interpretable dynamical properties. Here we apply a data-driven spectral analysis called graph dynamic mode decomposition, which obtains the dynamical properties for collective motion classification. Using a ballgame as an example, we classified the strategic collective motions in different global behaviours and discovered that, in addition to the physical properties, the contextual node information was critical for classification. Furthermore, we discovered the label-specific stronger spectra in the relationship among the nearest agents, providing physical and semantic interpretations. Our approach contributes to the understanding of complex networks involving collective motions from the perspective of nonlinear dynamical systems.