Accuracy
Human readable network troubleshooting based on anomaly detection and feature scoring
Navarro, Jose M., Huet, Alexis, Rossi, Dario
Network troubleshooting is still a heavily human-intensive process. To reduce the time spent by human operators in the diagnosis process, we present a system based on (i) unsupervised learning methods for detecting anomalies in the time domain, (ii) an attention mechanism to rank features in the feature space and finally (iii) an expert knowledge module able to seamlessly incorporate previously collected domain-knowledge. In this paper, we thoroughly evaluate the performance of the full system and of its individual building blocks: particularly, we consider (i) 10 anomaly detection algorithms as well as (ii) 10 attention mechanisms, that comprehensively represent the current state of the art in the respective fields. Leveraging a unique collection of expert-labeled datasets worth several months of real router telemetry data, we perform a thorough performance evaluation contrasting practical results in constrained stream-mode settings, with the results achievable by an ideal oracle in academic settings. Our experimental evaluation shows that (i) the proposed system is effective in achieving high levels of agreement with the expert, and (ii) that even a simple statistical approach is able to extract useful information from expert knowledge gained in past cases, significantly improving troubleshooting performance.
Multiple Sclerosis Lesions Identification/Segmentation in Magnetic Resonance Imaging using Ensemble CNN and Uncertainty Classification
Placidi, Giuseppe, Cinque, Luigi, Mignosi, Filippo, Polsinelli, Matteo
To date, several automated strategies for identification/segmentation of Multiple Sclerosis (MS) lesions by Magnetic Resonance Imaging (MRI) have been presented which are either outperformed by human experts or, at least, whose results are well distinguishable from humans. This is due to the ambiguity originated by MRI instabilities, peculiar MS Heterogeneity and MRI unspecific nature with respect to MS. Physicians partially treat the uncertainty generated by ambiguity relying on personal radiological/clinical/anatomical background and experience. We present an automated framework for MS lesions identification/segmentation based on three pivotal concepts to better emulate human reasoning: the modeling of uncertainty; the proposal of two, separately trained, CNN, one optimized with respect to lesions themselves and the other to the environment surrounding lesions, respectively repeated for axial, coronal and sagittal directions; the ensemble of the CNN output. The proposed framework is trained, validated and tested on the 2016 MSSEG benchmark public data set from a single imaging modality, FLuid-Attenuated Inversion Recovery (FLAIR). The comparison, performed on the segmented lesions by means of most of the metrics normally used with respect to the ground-truth and the 7 human raters in MSSEG, prove that there is no significant difference between the proposed framework and the other raters. Results are also shown for the uncertainty, though a comparison with the other raters is impossible.
Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?
Richards, Dominic, Dobriban, Edgar, Rebeschini, Patrick
Modern methods for learning from data depend on many tuning parameters, such as the stepsize for optimization methods, and the regularization strength for regularized learning methods. Since performance can depend strongly on these parameters, it is important to develop comparisons between \emph{classes of methods}, not just for particularly tuned ones. Here, we take aim to compare classes of estimators via the relative performance of the \emph{best method in the class}. This allows us to rigorously quantify the tuning sensitivity of learning algorithms. As an illustration, we investigate the statistical estimation performance of ridge regression with a uniform grid of regularization parameters, and of gradient descent iterates with a fixed stepsize, in the standard linear model with a random isotropic ground truth parameter. (1) For orthogonal designs, we find the \emph{exact minimax optimal classes of estimators}, showing they are equal to gradient descent with a polynomially decaying learning rate. We find the exact suboptimalities of ridge regression and gradient descent with a fixed stepsize, showing that they decay as either $1/k$ or $1/k^2$ for specific ranges of $k$ estimators. (2) For general designs with a large number of non-zero eigenvalues, we find that gradient descent outperforms ridge regression when the eigenvalues decay slowly, as a power law with exponent less than unity. If instead the eigenvalues decay quickly, as a power law with exponent greater than unity or exponentially, we find that ridge regression outperforms gradient descent. Our results highlight the importance of tuning parameters. In particular, while optimally tuned ridge regression is the best estimator in our case, it can be outperformed by gradient descent when both are restricted to being tuned over a finite regularization grid.
Predicting risk of sudden cardiac death in patients with cardiac sarcoidosis using multimodality imaging and personalized heart modeling in a multivariable classifier
Cardiac sarcoidosis (CS), an inflammatory disease characterized by formation of granulomas in the heart, is associated with high risk of sudden cardiac death (SCD) from ventricular arrhythmias. Current “one-size-fits-all” guidelines for SCD risk assessment in CS result in insufficient appropriate primary prevention. Here, we present a two-step precision risk prediction technology for patients with CS. First, a patient’s arrhythmogenic propensity arising from heterogeneous CS-induced ventricular remodeling is assessed using a novel personalized magnetic-resonance imaging and positron-emission tomography fusion mechanistic model. The resulting simulations of arrhythmogenesis are fed, together with a set of imaging and clinical biomarkers, into a supervised classifier. In a retrospective study of 45 patients, the technology achieved testing results of 60% sensitivity [95% confidence interval (CI): 57-63%], 72% specificity [95% CI: 70-74%], and 0.754 area under the receiver operating characteristic curve [95% CI: 0.710-0.797]. It outperformed clinical metrics, highlighting its potential to transform CS risk stratification.
30 Most Asked Machine Learning Questions Answered - KDnuggets
Machine Learning is the path to a better and advanced future. A Machine Learning Developer is the most demanding job in 2021, and it is going to increase by 20–30% in the upcoming 3–5 years. Machine Learning by the core is all statistics and programming concepts. The language that is mostly used by Machine learning developers for coding is python because of its simplicity. In this blog, you will find some of the most asked machine learning questions that every machine learning enthusiast has to answer one day. Ans: Machine learning is the science of getting computers to act in a real-time situation without being explicitly programmed.
Cascading Neural Network Methodology for Artificial Intelligence-Assisted Radiographic Detection and Classification of Lead-Less Implanted Electronic Devices within the Chest
Demirer, Mutlu, White, Richard D., Gupta, Vikash, Sebro, Ronnie A., Erdal, Barbaros S.
Background & Purpose: Chest X-Ray (CXR) use in pre-MRI safety screening for Lead-Less Implanted Electronic Devices (LLIEDs), easily overlooked or misidentified on a frontal view (often only acquired), is common. Although most LLIED types are "MRI conditional": 1. Some are stringently conditional; 2. Different conditional types have specific patient- or device- management requirements; and 3. Particular types are "MRI unsafe". This work focused on developing CXR interpretation-assisting Artificial Intelligence (AI) methodology with: 1. 100% detection for LLIED presence/location; and 2. High classification in LLIED typing. Materials & Methods: Data-mining (03/1993-02/2021) produced an AI Model Development Population (1,100 patients/4,871 images) creating 4,924 LLIED Region-Of-Interests (ROIs) (with image-quality grading) used in Training, Validation, and Testing. For developing the cascading neural network (detection via Faster R-CNN and classification via Inception V3), "ground-truth" CXR annotation (ROI labeling per LLIED), as well as inference display (as Generated Bounding Boxes (GBBs)), relied on a GPU-based graphical user interface. Results: To achieve 100% LLIED detection, probability threshold reduction to 0.00002 was required by Model 1, resulting in increasing GBBs per LLIED-related ROI. Targeting LLIED-type classification following detection of all LLIEDs, Model 2 multi-classified to reach high-performance while decreasing falsely positive GBBs. Despite 24% suboptimal ROI image quality, classification was correct in 98.9% and AUCs for the 9 LLIED-types were 1.00 for 8 and 0.92 for 1. For all misclassification cases: 1. None involved stringently conditional or unsafe LLIEDs; and 2. Most were attributable to suboptimal images. Conclusion: This project successfully developed a LLIED-related AI methodology supporting: 1. 100% detection; and 2. Typically 100% type classification.
PlanAlyzer
We did not expect to see any real causal sufficiency errors due to the expert nature of the authors of PLANOUT-A. Rather, we expect to see some false positives due to the fact that PLANALYZER is aggressive about flagging potential causal sufficiency errors. We made this design choice because the cost of unrecorded confounders can be very high. PLANOUT scripts in deployment at Facebook represent a range of experimental designs. We observed factorial designs, conditional assignment, within-subjects experiments, cluster random assignment, and bandits experiments in the scripts we examined. Real-world PLANOUT scripts unsurprisingly contained few errors, because they were primarily written and overseen by experts in experimental design. Therefore, to test how well PLANALYZER finds errors, we selected a subset of fifty scripts from PLANOUT-A and mutated them. We then validated a subset of the contrasts PLANALYZER produced against a corpus of hand-selected contrasts monitored and compared by an automated tool used at Facebook. Finally, we reported on PLANALYZER'S performance, because its effectiveness requires accurately identifying meaningful contrasts within a reasonable amount of time.
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Bénédict, Gabriel, Koops, Vincent, Odijk, Daan, de Rijke, Maarten
Multiclass multilabel classification refers to the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of that multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). Empirically, these methods have been reported to achieve good performance on different metrics (F1 score, Recall, Precision, etc.). Theoretically though, the multilabel classification reductions does not accommodate for the prediction of varying numbers of labels per example and the underlying losses are distant estimates of the performance metrics. We propose a loss function, sigmoidF1. It is an approximation of the F1 score that (I) is smooth and tractable for stochastic gradient descent, (II) naturally approximates a multilabel metric, (III) estimates label propensities and label counts. More generally, we show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on different text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. In our experiments, we embed the sigmoidF1 loss in a classification head that is attached to state-of-the-art efficient pretrained neural networks MobileNetV2 and DistilBERT. Our experiments show that sigmoidF1 outperforms other loss functions on four datasets and several metrics. These results show the effectiveness of using inference-time metrics as loss function at training time in general and their potential on non-trivial classification problems like multilabel classification.
Task-Sensitive Concept Drift Detector with Constraint Embedding
Castellani, Andrea, Schmitt, Sebastian, Hammer, Barbara
Detecting drifts in data is essential for machine learning applications, as changes in the statistics of processed data typically has a profound influence on the performance of trained models. Most of the available drift detection methods are either supervised and require access to the true labels during inference time, or they are completely unsupervised and aim for changes in distributions without taking label information into account. We propose a novel task-sensitive semi-supervised drift detection scheme, which utilizes label information while training the initial model, but takes into account that supervised label information is no longer available when using the model during inference. It utilizes a constrained low-dimensional embedding representation of the input data. This way, it is best suited for the classification task. It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift, where the classification performance is not affected by the drift. In the proposed framework, the actual method to detect a change in the statistics of incoming data samples can be chosen freely. Experimental evaluation on nine benchmarks datasets, with different types of drift, demonstrates that the proposed framework can reliably detect drifts, and outperforms state-of-the-art unsupervised drift detection approaches.
CGEMs: A Metric Model for Automatic Code Generation using GPT-3
Narasimhan, Aishwarya, Rao, Krishna Prasad Agara Venkatesha, B, Veena M
Today, AI technology is showing its strengths in almost every industry and walks of life. From text generation, text summarization, chatbots, NLP is being used widely. One such paradigm is automatic code generation. An AI could be generating anything; hence the output space is unconstrained. A self-driving car is driven for 100 million miles to validate its safety, but tests cannot be written to monitor and cover an unconstrained space. One of the solutions to validate AI-generated content is to constrain the problem and convert it from abstract to realistic, and this can be accomplished by either validating the unconstrained algorithm using theoretical proofs or by using Monte-Carlo simulation methods. In this case, we use the latter approach to test/validate a statistically significant number of samples. This hypothesis of validating the AI-generated code is the main motive of this work and to know if AI-generated code is reliable, a metric model CGEMs is proposed. This is an extremely challenging task as programs can have different logic with different naming conventions, but the metrics must capture the structure and logic of the program. This is similar to the importance grammar carries in AI-based text generation, Q&A, translations, etc. The various metrics that are garnered in this work to support the evaluation of generated code are as follows: Compilation, NL description to logic conversion, number of edits needed, some of the commonly used static-code metrics and NLP metrics. These metrics are applied to 80 codes generated using OpenAI's GPT-3. Post which a Neural network is designed for binary classification (acceptable/not acceptable quality of the generated code). The inputs to this network are the values of the features obtained from the metrics. The model achieves a classification accuracy of 76.92% and an F1 score of 55.56%. XAI is augmented for model interpretability.