Accuracy
Artificial Intelligence, speech and language processing approaches to monitoring Alzheimer's Disease: a systematic review
Garcia, Sofia de la Fuente, Ritchie, Craig, Luz, Saturnino
Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's Disease, detailing current research procedures, highlighting their limitations and suggesting strategies to address them. We conducted a systematic review of original research between 2000 and 2019, registered in PROSPERO (reference CRD42018116606). An interdisciplinary search covered six databases on engineering (ACM and IEEE), psychology (PsycINFO), medicine (PubMed and Embase) and Web of Science. Bibliographies of relevant papers were screened until December 2019. From 3,654 search results 51 articles were selected against the eligibility criteria. Four tables summarise their findings: study details (aim, population, interventions, comparisons, methods and outcomes), data details (size, type, modalities, annotation, balance, availability and language of study), methodology (pre-processing, feature generation, machine learning, evaluation and results) and clinical applicability (research implications, clinical potential, risk of bias and strengths/limitations). While promising results are reported across nearly all 51 studies, very few have been implemented in clinical research or practice. We concluded that the main limitations of the field are poor standardisation, limited comparability of results, and a degree of disconnect between study aims and clinical applications. Attempts to close these gaps should support translation of future research into clinical practice.
Class-Weighted Evaluation Metrics for Imbalanced Data Classification
Gupta, Akhilesh, Tatbul, Nesime, Marcus, Ryan, Zhou, Shengtian, Lee, Insup, Gottschlich, Justin
Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes, making fair assessment of classifiers a challenging task. Balanced Accuracy is a popular metric used to evaluate a classifier's prediction performance under such scenarios. However, this metric falls short when classes vary in importance, especially when class importance is skewed differently from class cardinality distributions. In this paper, we propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances. Experiments with several state-of-the-art classifiers tested on real-world datasets and benchmarks from two different domains show that our new framework is more effective than Balanced Accuracy - not only in evaluating and ranking model predictions, but also in training the models themselves. For a broad range of machine learning (ML) tasks, predictive modeling in the presence of imbalanced datasets - those with severe distribution skews - has been a longstanding problem (He & Garcia, 2009; Sun et al., 2009; He & Ma, 2013; Branco et al., 2016; Hilario et al., 2018; Johnson & Khoshgoftaar, 2019). Imbalanced training datasets lead to models with prediction bias towards majority classes, which in turn results in misclassification of the underrepresented ones.
Extracting Angina Symptoms from Clinical Notes Using Pre-Trained Transformer Architectures
Eisman, Aaron S., Shah, Nishant R., Eickhoff, Carsten, Zerveas, George, Chen, Elizabeth S., Wu, Wen-Chih, Sarkar, Indra Neil
Anginal symptoms can connote increased cardiac risk and a need for change in cardiovascular management. This study evaluated the potential to extract these symptoms from physician notes using the Bidirectional Encoder from Transformers language model fine-tuned on a domain-specific corpus. The history of present illness section of 459 expert annotated primary care physician notes from consecutive patients referred for cardiac testing without known atherosclerotic cardiovascular disease were included. Notes were annotated for positive and negative mentions of chest pain and shortness of breath characterization. The results demonstrate high sensitivity and specificity for the detection of chest pain or discomfort, substernal chest pain, shortness of breath, and dyspnea on exertion. Small sample size limited extracting factors related to provocation and palliation of chest pain. This study provides a promising starting point for the natural language processing of physician notes to characterize clinically actionable anginal symptoms. Introduction Angina pectoris is a constellation of symptoms that portends inadequate oxygenation of cardiac muscle due to either a decrease in coronary blood supply, an increase in myocardial oxygen demand, or both.
FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching
Detection of semantic similarity plays a vital role in sentence matching. It requires to learn discriminative representations of natural language. Recently, owing to more and more sophisticated model architecture, impressive progress has been made, along with a time-consuming training process and not-interpretable inference. To alleviate this problem, we explore a metric learning approach, named FILM (Fast, Interpretable, and Low-rank Metric learning) to efficiently find a high discriminative projection of the high-dimensional data. We construct this metric learning problem as a manifold optimization problem and solve it with the Cayley transformation method with the Barzilai-Borwein step size. In experiments, we apply FILM with triplet loss minimization objective to the Quora Challenge and Semantic Textual Similarity (STS) Task. The results demonstrate that the FILM method achieves superior performance as well as the fastest computation speed, which is consistent with our theoretical analysis of time complexity.
CurbScan: Curb Detection and Tracking Using Multi-Sensor Fusion
Baek, Iljoo, Tai, Tzu-Chieh, Bhat, Manoj, Ellango, Karun, Shah, Tarang, Fuseini, Kamal, Ragunathan, null, Rajkumar, null
Reliable curb detection is critical for safe autonomous driving in urban contexts. Curb detection and tracking are also useful in vehicle localization and path planning. Past work utilized a 3D LiDAR sensor to determine accurate distance information and the geometric attributes of curbs. However, such an approach requires dense point cloud data and is also vulnerable to false positives from obstacles present on both road and off-road areas. In this paper, we propose an approach to detect and track curbs by fusing together data from multiple sensors: sparse LiDAR data, a mono camera and low-cost ultrasonic sensors. The detection algorithm is based on a single 3D LiDAR and a mono camera sensor used to detect candidate curb features and it effectively removes false positives arising from surrounding static and moving obstacles. The detection accuracy of the tracking algorithm is boosted by using Kalman filter-based prediction and fusion with lateral distance information from low-cost ultrasonic sensors. We next propose a line-fitting algorithm that yields robust results for curb locations. Finally, we demonstrate the practical feasibility of our solution by testing in different road environments and evaluating our implementation in a real vehicle\footnote{Demo video clips demonstrating our algorithm have been uploaded to Youtube: https://www.youtube.com/watch?v=w5MwsdWhcy4, https://www.youtube.com/watch?v=Gd506RklfG8.}. Our algorithm maintains over 90\% accuracy within 4.5-22 meters and 0-14 meters for the KITTI dataset and our dataset respectively, and its average processing time per frame is approximately 10 ms on Intel i7 x86 and 100ms on NVIDIA Xavier board.
Data Mining and Machine Learning -- Credibility of the trained model
When we train a machine learning algorithm we have to be credible and prove that it is better than another model. In this article we will see the techniques to do this. How do I split the data of my set in order to make correct estimates of the performance of my ML algorithm? We will see it also with respect to the confidence limits. We will see how to evaluate two classifiers. Since in general there is no better technique than another, you have to compare the different approaches. Depending on the problem under examination, we have preferable approaches compared to others. In general we have no guarantee that one technique works better than others if we do not use statistical tests. For example, if we have a binary problem, our choice will probably fall on a binary classifier. Another important aspect is that, in many contexts, not all errors weigh in the same way, for example in the medical field saying that a sick patient is well means that the error has a higher cost than the opposite error. We initially assume constant costs for each type of error, which means not considering the costs. We will also see the cost-sensitive performance evaluation in our scheme (model). It is also possible to make a numeric evaluation, for example, to predict the performance of our processor. While the classification works only with labels or probabilities, it is also possible to evaluate a numerical value. Although in theory I could do it, if I evaluate the performance on the errors of the training set I make a mistake. Evaluation of the training set error is not a good indicator of performance on future data.
Causal learning with sufficient statistics: an information bottleneck approach
Chicharro, Daniel, Besserve, Michel, Panzeri, Stefano
The inference of causal relationships using observational data from partially observed multivariate systems with hidden variables is a fundamental question in many scientific domains. Methods extracting causal information from conditional independencies between variables of a system are common tools for this purpose, but are limited in the lack of independencies. To surmount this limitation, we capitalize on the fact that the laws governing the generative mechanisms of a system often result in substructures embodied in the generative functional equation of a variable, which act as sufficient statistics for the influence that other variables have on it. These functional sufficient statistics constitute intermediate hidden variables providing new conditional independencies to be tested. We propose to use the Information Bottleneck method, a technique commonly applied for dimensionality reduction, to find underlying sufficient sets of statistics. Using these statistics we formulate new additional rules of causal orientation that provide causal information not obtainable from standard structure learning algorithms, which exploit only conditional independencies between observable variables. We validate the use of sufficient statistics for structure learning both with simulated systems built to contain specific sufficient statistics and with benchmark data from regulatory rules previously and independently proposed to model biological signal transduction networks.
Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization
Xie, Jiyang, Ma, Zhanyu, Zhang, Guoqiang, Xue, Jing-Hao, Tan, Zheng-Hua, Guo, Jun
Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). In this paper, we propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with a parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes (SGVB) inference in order to carry out an end-to-end training of DNNs. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on five widely used datasets in computer vision. The advanced dropout outperforms all the referred techniques by 0.83% on average for all the datasets. An ablation study is conducted to analyze the effectiveness of each component. Meanwhile, convergence of dropout rate and ability to prevent overfitting are discussed in terms of classification performance. Moreover, we extend the application of the advanced dropout to uncertainty inference and network pruning, and we find that the advanced dropout is superior to the corresponding referred methods. The advanced dropout improves classification accuracies by 4% in uncertainty inference and by 0.2% and 0.5% when pruning more than 90% of nodes and 99.8% of parameters, respectively.
Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.
Investigating the Robustness of Artificial Intelligent Algorithms with Mixture Experiments
Lian, Jiayi, Freeman, Laura, Hong, Yili, Deng, Xinwei
Artificial intelligent (AI) algorithms, such as deep learning and XGboost, are used in numerous applications including computer vision, autonomous driving, and medical diagnostics. The robustness of these AI algorithms is of great interest as inaccurate prediction could result in safety concerns and limit the adoption of AI systems. In this paper, we propose a framework based on design of experiments to systematically investigate the robustness of AI classification algorithms. A robust classification algorithm is expected to have high accuracy and low variability under different application scenarios. The robustness can be affected by a wide range of factors such as the imbalance of class labels in the training dataset, the chosen prediction algorithm, the chosen dataset of the application, and a change of distribution in the training and test datasets. To investigate the robustness of AI classification algorithms, we conduct a comprehensive set of mixture experiments to collect prediction performance results. Then statistical analyses are conducted to understand how various factors affect the robustness of AI classification algorithms. We summarize our findings and provide suggestions to practitioners in AI applications.