Goto

Collaborating Authors

 Performance Analysis


Error Metrics in Machine learning

#artificialintelligence

If you are reading this blog, you will probably be familiar with machine learning or will be interested in learning the same. Machine learning is a subfield of artificial intelligence, where it makes the systems to learn from data and make them capable of taking decisions with minimal human intervention. Now generally, we use the word "model" to indicate this intelligent system. Now, suppose we have a model which is designed to perform a particular task. This task can be anything like, for example, classifying the emails as not spam and spam, or an image classification problem.


SSSE: Efficiently Erasing Samples from Trained Machine Learning Models

arXiv.org Machine Learning

The availability of large amounts of user-provided data has been key to the success of machine learning for many real-world tasks. Recently, an increasing awareness has emerged that users should be given more control about how their data is used. In particular, users should have the right to prohibit the use of their data for training machine learning systems, and to have it erased from already trained systems. While several sample erasure methods have been proposed, all of them have drawbacks which have prevented them from gaining widespread adoption. Most methods are either only applicable to very specific families of models, sacrifice too much of the original model's accuracy, or they have prohibitive memory or computational requirements. In this paper, we propose an efficient and effective algorithm, SSSE, for samples erasure, that is applicable to a wide class of machine learning models. From a second-order analysis of the model's loss landscape we derive a closed-form update step of the model parameters that only requires access to the data to be erased, not to the original training set. Experiments on three datasets, CelebFaces attributes (CelebA), Animals with Attributes 2 (AwA2) and CIFAR10, show that in certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.


Bootstrapping Generalization of Process Models Discovered From Event Data

arXiv.org Artificial Intelligence

Process mining studies ways to derive value from process executions recorded in event logs of IT-systems, with process discovery the task of inferring a process model for an event log emitted by some unknown system. One quality criterion for discovered process models is generalization. Generalization seeks to quantify how well the discovered model describes future executions of the system, and is perhaps the least understood quality criterion in process mining. The lack of understanding is primarily a consequence of generalization seeking to measure properties over the entire future behavior of the system, when the only available sample of behavior is that provided by the event log itself. In this paper, we draw inspiration from computational statistics, and employ a bootstrap approach to estimate properties of a population based on a sample. Specifically, we define an estimator of the model's generalization based on the event log it was discovered from, and then use bootstrapping to measure the generalization of the model with respect to the system, and its statistical significance. Experiments demonstrate the feasibility of the approach in industrial settings.


ROC Curve Explained - KDnuggets

#artificialintelligence

Area under the ROC curve is one of the most useful metrics to evaluate a supervised classification model. This metric is commonly referred to as ROC-AUC. Here, the ROC stands for Receiver Operating Characteristic and AUC stands for Area Under the Curve. In my opinion, AUROCC is a more accurate abbreviation but perhaps doesn't sound as nice. In the right context, AUC can also imply ROC-AUC even though it can refer to area under any curve.


Impossibility results for fair representations

arXiv.org Machine Learning

With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints. Such representations are useful when they can be fixed for training models on various different tasks and also when they serve as data filtering between the raw data (known to the representation designer) and potentially malicious agents that use the data under the representation to learn predictive models and make decisions. A long list of recent research papers strive to provide tools for achieving these goals. However, we prove that this is basically a futile effort. Roughly stated, we prove that no representation can guarantee the fairness of classifiers for different tasks trained using it; even the basic goal of achieving label-independent Demographic Parity fairness fails once the marginal data distribution shifts. More refined notions of fairness, like Odds Equality, cannot be guaranteed by a representation that does not take into account the task specific labeling rule with respect to which such fairness will be evaluated (even if the marginal data distribution is known a priory). Furthermore, except for trivial cases, no representation can guarantee Odds Equality fairness for any two different tasks, while allowing accurate label predictions for both. While some of our conclusions are intuitive, we formulate (and prove) crisp statements of such impossibilities, often contrasting impressions conveyed by many recent works on fair representations.


A Survey of Uncertainty in Deep Neural Networks

arXiv.org Machine Learning

Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.


EchoEA: Echo Information between Entities and Relations for Entity Alignment

arXiv.org Artificial Intelligence

Entity alignment (EA) is to discover entities referring to the same object in the real world from different knowledge graphs (KGs). It plays an important role in automatically integrating KGs from multiple sources. Existing knowledge graph embedding (KGE) methods based on Graph Neural Networks (GNNs) have achieved promising results, which enhance entity representation with relation information unidirectionally. Besides, more and more methods introduce semi-supervision to ask for more labeled training data. However, two challenges still exist in these methods: (1) Insufficient interaction: The interaction between entities and relations is insufficiently utilized. (2) Low-quality bootstrapping: The generated semi-supervised data is of low quality. In this paper, we propose a novel framework, Echo Entity Alignment (EchoEA), which leverages self-attention mechanism to spread entity information to relations and echo back to entities. The relation representation is dynamically computed from entity representation. Symmetrically, the next entity representation is dynamically calculated from relation representation, which shows sufficient interaction. Furthermore, we propose attribute-combined bi-directional global-filtered strategy (ABGS) to improve bootstrapping, reduce false samples and generate high-quality training data. The experimental results on three real-world cross-lingual datasets are stable at around 96\% at hits@1 on average, showing that our approach not only significantly outperforms the state-of-the-art methods, but also is universal and transferable for existing KGE methods.


A convolutional neural network for teeth margin detection on 3-dimensional dental meshes

arXiv.org Artificial Intelligence

We proposed a convolutional neural network for vertex classification on 3-dimensional dental meshes, and used it to detect teeth margins. An expanding layer was constructed to collect statistic values of neighbor vertex features and compute new features for each vertex with convolutional neural networks. An end-to-end neural network was proposed to take vertex features, including coordinates, curvatures and distance, as input and output each vertex classification label. Several network structures with different parameters of expanding layers and a base line network without expanding layers were designed and trained by 1156 dental meshes. The accuracy, recall and precision were validated on 145 dental meshes to rate the best network structures, which were finally tested on another 144 dental meshes. All networks with our expanding layers performed better than baseline, and the best one achieved an accuracy of 0.877 both on validation dataset and test dataset.


Exact Learning Augmented Naive Bayes Classifier

arXiv.org Artificial Intelligence

Earlier studies have shown that classification accuracies of Bayesian networks (BNs) obtained by maximizing the conditional log likelihood (CLL) of a class variable, given the feature variables, were higher than those obtained by maximizing the marginal likelihood (ML). However, differences between the performances of the two scores in the earlier studies may be attributed to the fact that they used approximate learning algorithms, not exact ones. This paper compares the classification accuracies of BNs with approximate learning using CLL to those with exact learning using ML. The results demonstrate that the classification accuracies of BNs obtained by maximizing the ML are higher than those obtained by maximizing the CLL for large data. However, the results also demonstrate that the classification accuracies of exact learning BNs using the ML are much worse than those of other methods when the sample size is small and the class variable has numerous parents. To resolve the problem, we propose an exact learning augmented naive Bayes classifier (ANB), which ensures a class variable with no parents. The proposed method is guaranteed to asymptotically estimate the identical class posterior to that of the exactly learned BN. Comparison experiments demonstrated the superior performance of the proposed method.


Test for non-negligible adverse shifts

arXiv.org Machine Learning

Statistical tests for dataset shift are susceptible to false alarms: they are sensitive to minor differences where there is in fact adequate sample coverage and predictive performance. We propose instead a robust framework for tests of dataset shift based on outlier scores, D-SOS for short. D-SOS detects adverse shifts and can identify false alarms caused by benign ones. It posits that a new (test) sample is not substantively worse than an old (training) sample, and not that the two are equal. The key idea is to reduce observations to outlier scores and compare contamination rates. Beyond comparing distributions, users can define what worse means in terms of predictive performance and other relevant notions. We show how versatile and practical D-SOS is for a wide range of real and simulated datasets. Unlike tests of equal distribution and of goodness-of-fit, the D-SOS tests are uniquely tailored to serve as robust performance metrics to monitor model drift and dataset shift.