Accuracy
Electrocardiogram screening for aortic valve stenosis using artificial intelligence
Between 1989 and 2019, 258 607 adults [mean age 63 16.3 years; women 122 790 (48%)] with an echocardiography and an ECG performed within 180 days were identified from the Mayo Clinic database. Moderate to severe AS by echocardiography was present in 9723 (3.7%) patients. Artificial intelligence training was performed in 129 788 (50%), validation in 25 893 (10%), and testing in 102 926 (40%) randomly selected subjects. The sensitivity, specificity, and accuracy were 78%, 74%, and 74%, respectively. The sensitivity increased and the specificity decreased as age increased.
A Comparison of Similarity Based Instance Selection Methods for Cross Project Defect Prediction
Hosseini, Seyedrebvar, Turhan, Burak
Context: Previous studies have shown that training data instance selection based on nearest neighborhood (NN) information can lead to better performance in cross project defect prediction (CPDP) by reducing heterogeneity in training datasets. However, neighborhood calculation is computationally expensive and approximate methods such as Locality Sensitive Hashing (LSH) can be as effective as exact methods. Aim: We aim at comparing instance selection methods for CPDP, namely LSH, NN-filter, and Genetic Instance Selection (GIS). Method: We conduct experiments with five base learners, optimizing their hyper parameters, on 13 datasets from PROMISE repository in order to compare the performance of LSH with benchmark instance selection methods NN-Filter and GIS. Results: The statistical tests show six distinct groups for F-measure performance. The top two group contains only LSH and GIS benchmarks whereas the bottom two groups contain only NN-Filter variants. LSH and GIS favor recall more than precision. In fact, for precision performance only three significantly distinct groups are detected by the tests where the top group is comprised of NN-Filter variants only. Recall wise, 16 different groups are identified where the top three groups contain only LSH methods, four of the next six are GIS only and the bottom five contain only NN-Filter. Finally, NN-Filter benchmarks never outperform the LSH counterparts with the same base learner, tuned or non-tuned. Further, they never even belong to the same rank group, meaning that LSH is always significantly better than NN-Filter with the same learner and settings. Conclusions: The increase in performance and the decrease in computational overhead and runtime make LSH a promising approach. However, the performance of LSH is based on high recall and in environments where precision is considered more important NN-Filter should be considered.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles
Salatino, Angelo A., Osborne, Francesco, Thanapalasingam, Thiviyan, Motta, Enrico
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
The Trouble with Brain Scans - Issue 98: Mind
One autumn afternoon in the bowels of UC Berkeley's Li Ka Shing Center, I was looking at my brain. I had just spent 10 minutes inside the 3 Tesla MRI scanner, the technical name for a very expensive, very high maintenance, very magnetic brain camera. Lying on my back inside the narrow tube, I had swallowed my claustrophobia and let myself be enveloped in darkness and a cacophony of foghorn-like bleats. At the time I was a research intern at UC Berkeley's Neuroeconomics Lab. That was the first time I saw my own brain from an MRI scan. It was a grayscale, 3-D reconstruction floating on the black background of a computer screen. As an undergraduate who studied neuroscience, I was enraptured. There is nothing quite like a young scientist's first encounter with an imaging technology that renders the hitherto invisible visible--magnetic resonance imaging took my breath away. I felt that I was looking not just inside my body, but into the biological recesses of my mind. It was a strange self-image, if indeed it was one.
Out of a hundred trials, how many errors does your speaker verifier make?
Brümmer, Niko, Ferrer, Luciana, Swart, Albert
Out of a hundred trials, how many errors does your speaker verifier make? For the user this is an important, practical question, but researchers and vendors typically sidestep it and supply instead the conditional error-rates that are given by the ROC/DET curve. We posit that the user's question is answered by the Bayes error-rate. We present a tutorial to show how to compute the error-rate that results when making Bayes decisions with calibrated likelihood ratios, supplied by the verifier, and an hypothesis prior, supplied by the user. For perfect calibration, the Bayes error-rate is upper bounded by min(EER,P,1-P), where EER is the equal-error-rate and P, 1-P are the prior probabilities of the competing hypotheses. The EER represents the accuracy of the verifier, while min(P,1-P) represents the hardness of the classification problem. We further show how the Bayes error-rate can be computed also for non-perfect calibration and how to generalize from error-rate to expected cost. We offer some criticism of decisions made by direct score thresholding. Finally, we demonstrate by analyzing error-rates of the recently published DCA-PLDA speaker verifier.
fairmodels: A Flexible Tool For Bias Detection, Visualization, And Mitigation
Wiśniewski, Jakub, Biecek, Przemysław
Machine learning decision systems are getting omnipresent in our lives. From dating apps to rating loan seekers, algorithms affect both our well-being and future. Typically, however, these systems are not infallible. Moreover, complex predictive models are really eager to learn social biases present in historical data that can lead to increasing discrimination. If we want to create models responsibly then we need tools for in-depth validation of models also from the perspective of potential discrimination. This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in classification models in an easy and flexible fashion. The fairmodels package offers a model-agnostic approach to bias detection, visualization and mitigation. The implemented set of functions and fairness metrics enables model fairness validation from different perspectives. The package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed not only to examine a single model, but also to facilitate comparisons between multiple models.
Classifying The Neighbourhood
Here we're going to look at an application of the k-nearest neighbours (kNN) algorithm to predict whether or not a telescope signal is gamma or hadron radiation using a Kaggle dataset. This is one of the older ones. I've just looked it up and the internet assures me that this was developed in the 1950s. It still works well today. I'll be using the scikit-learn kNN classification model for the example.
Trusted Artificial Intelligence: Towards Certification of Machine Learning Applications
Winter, Philip Matthias, Eder, Sebastian, Weissenböck, Johannes, Schwald, Christoph, Doms, Thomas, Vogt, Tom, Hochreiter, Sepp, Nessler, Bernhard
Artificial Intelligence is one of the fastest growing technologies of the 21st century and accompanies us in our daily lives when interacting with technical applications. However, reliance on such technical systems is crucial for their widespread applicability and acceptance. The societal tools to express reliance are usually formalized by lawful regulations, i.e., standards, norms, accreditations, and certificates. Therefore, the T\"UV AUSTRIA Group in cooperation with the Institute for Machine Learning at the Johannes Kepler University Linz, proposes a certification process and an audit catalog for Machine Learning applications. We are convinced that our approach can serve as the foundation for the certification of applications that use Machine Learning and Deep Learning, the techniques that drive the current revolution in Artificial Intelligence. While certain high-risk areas, such as fully autonomous robots in workspaces shared with humans, are still some time away from certification, we aim to cover low-risk applications with our certification procedure. Our holistic approach attempts to analyze Machine Learning applications from multiple perspectives to evaluate and verify the aspects of secure software development, functional requirements, data quality, data protection, and ethics. Inspired by existing work, we introduce four criticality levels to map the criticality of a Machine Learning application regarding the impact of its decisions on people, environment, and organizations. Currently, the audit catalog can be applied to low-risk applications within the scope of supervised learning as commonly encountered in industry. Guided by field experience, scientific developments, and market demands, the audit catalog will be extended and modified accordingly.
Online Learning Probabilistic Event Calculus Theories in Answer Set Programming
Katzouris, Nikos, Artikis, Alexander, Paliouras, Georgios
Complex Event Recognition (CER) systems detect event occurrences in streaming time-stamped input using predefined event patterns. Logic-based approaches are of special interest in CER, since, via Statistical Relational AI, they combine uncertainty-resilient reasoning with time and change, with machine learning, thus alleviating the cost of manual event pattern authoring. We present a system based on Answer Set Programming (ASP), capable of probabilistic reasoning with complex event patterns in the form of weighted rules in the Event Calculus, whose structure and weights are learnt online. We compare our ASP-based implementation with a Markov Logic-based one and with a number of state-of-the-art batch learning algorithms on CER datasets for activity recognition, maritime surveillance and fleet management. Our results demonstrate the superiority of our novel approach, both in terms of efficiency and predictive performance. This paper is under consideration for publication in Theory and Practice of Logic Programming (TPLP).
DropBlock: A New Regularization Technique
Regularization is a strategy implemented in a deep neural network that will reduce the generalization error but not the training error to perform well on not just the training data but also on new unseen inputs. An effective regularizer reduces the variance significantly while not overly increasing the bias, thus preventing overfitting. We use regularization techniques like L1 and L2 to reduce overfitting, penalizing the loss function, or regularization techniques like Dropouts and Spatial Dropouts, which discourage model complexity. The principle behind regularization methods in a neural network is to inject noise into neural networks to avoid overfitting the training data. L2 regularization is commonly known as weight decay or ridge regression, or Tikhonov regularization.