Goto

Collaborating Authors

 Accuracy


Novel Techniques to Assess Predictive Systems and Reduce Their Alarm Burden

arXiv.org Artificial Intelligence

Machine prediction algorithms (e.g., binary classifiers) often are adopted on the basis of claimed performance using classic metrics such as sensitivity and predictive value. However, classifier performance depends heavily upon the context (workflow) in which the classifier operates. Classic metrics do not reflect the realized utility of a predictor unless certain implicit assumptions are met, and these assumptions cannot be met in many common clinical scenarios. This often results in suboptimal implementations and in disappointment when expected outcomes are not achieved. One common failure mode for classic metrics arises when multiple predictions can be made for the same event, particularly when redundant true positive predictions produce little additional value. This describes many clinical alerting systems. We explain why classic metrics cannot correctly represent predictor performance in such contexts, and introduce an improved performance assessment technique using utility functions to score predictions based on their utility in a specific workflow context. The resulting utility metrics (u-metrics) explicitly account for the effects of temporal relationships on prediction utility. Compared to traditional measures, u-metrics more accurately reflect the real world costs and benefits of a predictor operating in a live clinical context. The improvement can be significant. We also describe a formal approach to snoozing, a mitigation strategy in which some predictions are suppressed to improve predictor performance by reducing false positives while retaining event capture. Snoozing is especially useful for predictors that generate interruptive alarms. U-metrics correctly measure and predict the performance benefits of snoozing, whereas traditional metrics do not.


Detecting and Diagnosing Terrestrial Gravitational-Wave Mimics Through Feature Learning

arXiv.org Artificial Intelligence

As engineered systems grow in complexity, there is an increasing need for automatic methods that can detect, diagnose, and even correct transient anomalies that inevitably arise and can be difficult or impossible to diagnose and fix manually. Among the most sensitive and complex systems of our civilization are the detectors that search for incredibly small variations in distance caused by gravitational waves -- phenomena originally predicted by Albert Einstein to emerge and propagate through the universe as the result of collisions between black holes and other massive objects in deep space. The extreme complexity and precision of such detectors causes them to be subject to transient noise issues that can significantly limit their sensitivity and effectiveness. In this work, we present a demonstration of a method that can detect and characterize emergent transient anomalies of such massively complex systems. We illustrate the performance, precision, and adaptability of the automated solution via one of the prevalent issues limiting gravitational-wave discoveries: noise artifacts of terrestrial origin that contaminate gravitational wave observatories' highly sensitive measurements and can obscure or even mimic the faint astrophysical signals for which they are listening. Specifically, we demonstrate how a highly interpretable convolutional classifier can automatically learn to detect transient anomalies from auxiliary detector data without needing to observe the anomalies themselves. We also illustrate several other useful features of the model, including how it performs automatic variable selection to reduce tens of thousands of auxiliary data channels to only a few relevant ones; how it identifies behavioral signatures predictive of anomalies in those channels; and how it can be used to investigate individual anomalies and the channels associated with them.


Architectural Optimization and Feature Learning for High-Dimensional Time Series Datasets

arXiv.org Artificial Intelligence

As our ability to sense increases, we are experiencing a transition from data-poor problems, in which the central issue is a lack of relevant data, to data-rich problems, in which the central issue is to identify a few relevant features in a sea of observations. Motivated by applications in gravitational-wave astrophysics, we study the problem of predicting the presence of transient noise artifacts in a gravitational wave detector from a rich collection of measurements from the detector and its environment. We argue that feature learning--in which relevant features are optimized from data--is critical to achieving high accuracy. We introduce models that reduce the error rate by over 60% compared to the previous state of the art, which used fixed, hand-crafted features. Feature learning is useful not only because it improves performance on prediction tasks; the results provide valuable information about patterns associated with phenomena of interest that would otherwise be undiscoverable. In our application, features found to be associated with transient noise provide diagnostic information about its origin and suggest mitigation strategies. Learning in high-dimensional settings is challenging. Through experiments with a variety of architectures, we identify two key factors in successful models: sparsity, for selecting relevant variables within the high-dimensional observations; and depth, which confers flexibility for handling complex interactions and robustness with respect to temporal variations. We illustrate their significance through systematic experiments on real detector data. Our results provide experimental corroboration of common assumptions in the machine-learning community and have direct applicability to improving our ability to sense gravitational waves, as well as to many other problem settings with similarly high-dimensional, noisy, or partly irrelevant data.


Drawbacks of using AUC-ROC as measure

#artificialintelligence

AUC measures the power of discrimination and it ignores the predicted probability values and the goodness-of-fit of the model. It is possible that a well fitted model has poor discrimination, if probabilities for positives are only moderately higher than those for negatives. It summarizes the test performance over regions of the ROC space in which one would rarely operate. For example, general ml tasks would rarely operate in the extreme left hand side (high false negative) of the curve and extreme right hand side (high false positive). For example in medical diagnostics, false negatives are more expensive than false positives. It does not give information about the spatial distribution of model errors.


Recommendation Systems with Distribution-Free Reliability Guarantees

arXiv.org Machine Learning

The digitization of all manner of services has introduced recommendation systems into many aspects of our day-to-day lives. In particular, recommendation systems are now being applied to safety-critical domains such as making lifestyle recommendations to patients in healthcare [Hammer et al., 2015, Tran et al., 2021]. It is therefore increasingly important that deployed recommender systems do not output recommendations devoid of uncertainty annotations. Meaningful recommendations should come with transparent and reliable statistical assessments. To date, the majority of deployed systems have fallen far short of this desideratum [Covington et al., 2016, Liu et al., 2017, Geyik et al., 2018]. Augmenting recommendation systems with internal tracking of statistical error rates would unlock new capabilities and applications. One such capability is the ability to enforce auxiliary constraints while still guaranteeing a baseline number of high-quality items in each slate of recommendations. For example, we could diversify slates whose quality we are confident in, while leaving lower-confidence slates untouched. Furthermore, the strong guarantees provided by uncertainty quantification are a prerequisite for applying recommendation systems to safety-critical tasks such as medical diagnosis, where a misdiagnosis due to uncertain predictions can be fatal.


Intersection over Union (IoU) in Object Detection and Segmentation

#artificialintelligence

Intersection Over Union (IoU) is a number that quantifies the degree of overlap between two boxes. In the case of object detection and segmentation, IoU evaluates the overlap of Ground Truth and Prediction region. If you are a computer vision practitioner or even an enthusiast, you must have come across the term very often. It is the first checkpoint for evaluating the accuracy of a model. In simple terms, it's a metric that helps us measure the correctness of a prediction. In this blog post, you will get a detailed and intuitive explanation of the following.


Hop-Count Based Self-Supervised Anomaly Detection on Attributed Networks

arXiv.org Artificial Intelligence

Recent years have witnessed an upsurge of interest in the problem of anomaly detection on attributed networks due to its importance in both research and practice. Although various approaches have been proposed to solve this problem, two major limitations exist: (1) unsupervised approaches usually work much less efficiently due to the lack of supervisory signal, and (2) existing anomaly detection methods only use local contextual information to detect anomalous nodes, e.g., one- or two-hop information, but ignore the global contextual information. Since anomalous nodes differ from normal nodes in structures and attributes, it is intuitive that the distance between anomalous nodes and their neighbors should be larger than that between normal nodes and their neighbors if we remove the edges connecting anomalous and normal nodes. Thus, hop counts based on both global and local contextual information can be served as the indicators of anomaly. Motivated by this intuition, we propose a hop-count based model (HCM) to detect anomalies by modeling both local and global contextual information. To make better use of hop counts for anomaly identification, we propose to use hop counts prediction as a self-supervised task. We design two anomaly scores based on the hop counts prediction via HCM model to identify anomalies. Besides, we employ Bayesian learning to train HCM model for capturing uncertainty in learned parameters and avoiding overfitting. Extensive experiments on real-world attributed networks demonstrate that our proposed model is effective in anomaly detection.


PhilaeX: Explaining the Failure and Success of AI Models in Malware Detection

arXiv.org Artificial Intelligence

The explanation to an AI model's prediction used to support decision making in cyber security, is of critical importance. It is especially so when the model's incorrect prediction can lead to severe damages or even losses to lives and critical assets. However, most existing AI models lack the ability to provide explanations on their prediction results, despite their strong performance in most scenarios. In this work, we propose a novel explainable AI method, called PhilaeX, that provides the heuristic means to identify the optimized subset of features to form the complete explanations of AI models' predictions. It identifies the features that lead to the model's borderline prediction, and those with positive individual contributions are extracted. The feature attributions are then quantified through the optimization of a Ridge regression model. We verify the explanation fidelity through two experiments. First, we assess our method's capability in correctly identifying the activated features in the adversarial samples of Android malwares, through the features attribution values from PhilaeX. Second, the deduction and augmentation tests, are used to assess the fidelity of the explanations. The results show that PhilaeX is able to explain different types of classifiers correctly, with higher fidelity explanations, compared to the state-of-the-arts methods such as LIME and SHAP.


Depth image conversion model based on CycleGAN for growing tomato truss identification - Plant Methods

#artificialintelligence

On tomato plants, the flowering truss is a group or cluster of smaller stems where flowers and fruit develop, while the growing truss is the most extended part of the stem. Because the state of the growing truss reacts sensitively to the surrounding environment, it is essential to control its growth in the early stages. With the recent development of information and artificial intelligence technology in agriculture, a previous study developed a real-time acquisition and evaluation method for images using robots. Furthermore, we used image processing to locate the growing truss to extract growth information. Among the different vision algorithms, the CycleGAN algorithm was used to generate and transform unpaired images using generated learning images. In this study, we developed a robot-based system for simultaneously acquiring RGB and depth images of the growing truss of the tomato plant. The segmentation performance for approximately 35 samples was compared via false negative (FN) and false positive (FP) indicators. For the depth camera image, we obtained FN and FP values of 17.55 ± 3.01% and 17.76 ± 3.55%, respectively. For the CycleGAN algorithm, we obtained FN and FP values of 19.24 ± 1.45% and 18.24 ± 1.54%, respectively. When segmentation was performed via image processing through depth image and CycleGAN, the mean intersection over union (mIoU) was 63.56 ± 8.44% and 69.25 ± 4.42%, respectively, indicating that the CycleGAN algorithm can identify the desired growing truss of the tomato plant with high precision. The on-site possibility of the image extraction technique using CycleGAN was confirmed when the image scanning robot drove in a straight line through a tomato greenhouse. In the future, the proposed approach is expected to be used in vision technology to scan tomato growth indicators in greenhouses using an unmanned robot platform.


Prediction of Dilatory Behavior in eLearning: A Comparison of Multiple Machine Learning Models

arXiv.org Machine Learning

Procrastination, the irrational delay of tasks, is a common occurrence in online learning. Potential negative consequences include higher risk of drop-outs, increased stress, and reduced mood. Due to the rise of learning management systems and learning analytics, indicators of such behavior can be detected, enabling predictions of future procrastination and other dilatory behavior. However, research focusing on such predictions is scarce. Moreover, studies involving different types of predictors and comparisons between the predictive performance of various methods are virtually non-existent. In this study, we aim to fill these research gaps by analyzing the performance of multiple machine learning algorithms when predicting the delayed or timely submission of online assignments in a higher education setting with two categories of predictors: subjective, questionnaire-based variables and objective, log-data based indicators extracted from a learning management system. The results show that models with objective predictors consistently outperform models with subjective predictors, and a combination of both variable types perform slightly better. For each of these three options, a different approach prevailed (Gradient Boosting Machines for the subjective, Bayesian multilevel models for the objective, and Random Forest for the combined predictors). We conclude that careful attention should be paid to the selection of predictors and algorithms before implementing such models in learning management systems.