Performance Analysis
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning. Common methods such as the holdout method for model evaluation and selection are covered, which are not recommended when working with small datasets. Different flavors of the bootstrap technique are introduced for estimating the uncertainty of performance estimates, as an alternative to confidence intervals via normal approximation if bootstrapping is computationally feasible. Common cross-validation techniques such as leave-one-out cross-validation and k-fold cross-validation are reviewed, the bias-variance trade-off for choosing k is discussed, and practical tips for the optimal choice of k are given based on empirical evidence. Different statistical tests for algorithm comparisons are presented, and strategies for dealing with multiple comparisons such as omnibus tests and multiple-comparison corrections are discussed. Finally, alternative methods for algorithm selection, such as the combined F-test 5x2 cross-validation and nested cross-validation, are recommended for comparing machine learning algorithms when datasets are small.
XNet: A convolutional neural network (CNN) implementation for medical X-Ray image segmentation suitable for small datasets
Bullock, Joseph, Cuesta-Lazaro, Carolina, Quera-Bofarull, Arnau
X-Ray image enhancement, along with many other medical image processing applications, requires the segmentation of images into bone, soft tissue, and open beam regions. We apply a machine learning approach to this problem, presenting an end-to-end solution which results in robust and efficient inference. Since medical institutions frequently do not have the resources to process and label the large quantity of X-Ray images usually needed for neural network training, we design an end-to-end solution for small datasets, while achieving state-of-the-art results. Our implementation produces an overall accuracy of 92%, F1 score of 0.92, and an AUC of 0.98, surpassing classical image processing techniques, such as clustering and entropy based methods, while improving upon the output of existing neural networks used for segmentation in non-medical contexts. The code used for this project is available online.
AnyThreat: An Opportunistic Knowledge Discovery Approach to Insider Threat Detection
Haidar, Diana, Gaber, Mohamed Medhat, Kovalchuk, Yevgeniya
Insider threat detection is getting an increased concern from academia, industry, and governments due to the growing number of malicious insider incidents. The existing approaches proposed for detecting insider threats still have a common shortcoming, which is the high number of false alarms (false positives). The challenge in these approaches is that it is essential to detect all anomalous behaviours which belong to a particular threat. To address this shortcoming, we propose an opportunistic knowledge discovery system, namely AnyThreat, with the aim to detect any anomalous behaviour in all malicious insider threats. We design the AnyThreat system with four components. (1) A feature engineering component, which constructs community data sets from the activity logs of a group of users having the same role. (2) An oversampling component, where we propose a novel oversampling technique named Artificial Minority Oversampling and Trapper REmoval (AMOTRE). AMOTRE first removes the minority (anomalous) instances that have a high resemblance with normal (majority) instances to reduce the number of false alarms, then it synthetically oversamples the minority class by shielding the border of the majority class. (3) A class decomposition component, which is introduced to cluster the instances of the majority class into subclasses to weaken the effect of the majority class without information loss. (4) A classification component, which applies a classification method on the subclasses to achieve a better separation between the majority class(es) and the minority class(es). AnyThreat is evaluated on synthetic data sets generated by Carnegie Mellon University. It detects approximately 87.5% of malicious insider threats, and achieves the minimum of false positives=3.36%.
Tuplemax Loss for Language Identification
Wan, Li, Sridhar, Prashant, Yu, Yang, Wang, Quan, Moreno, Ignacio Lopez
In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named tuplemax loss. As a matter of fact, a typical language identification system launched in North America has about 95% users who could speak no more than two languages. Using the tuplemax loss, our system achieved a 2.33% error rate, which is a relative 39.4% improvement over the 3.85% error rate of standard softmax loss method.
Anomaly Detection with Isolation Forests using H2O - Open Source Leader in AI and ML
Anomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failures, and so on. Anomaly detection can be performed in a supervised, semi-supervised, and unsupervised manner. For a supervised approach, we need to know whether each observation, event or item is anomalous or genuine, and we use this information during training. Obtaining labels for each observation might often be unrealistic. A semi-supervised approach uses the assumption that we only know which observations are genuine, non-anomalous, and we do not have any information on the anomalous observations.
General-to-Detailed GAN for Infrequent Class Medical Images
Koga, Tatsuki, Nonaka, Naoki, Sakuma, Jun, Seita, Jun
Deep learning has significant potential for medical imaging. However, since the incident rate of each disease varies widely, the frequency of classes in a medical image dataset is imbalanced, leading to poor accuracy for such infrequent classes. One possible solution is data augmentation of infrequent classes using synthesized images created by Generative Adversarial Networks (GANs), but conventional GANs also require certain amount of images to learn. To overcome this limitation, here we propose General-to-detailed GAN (GDGAN), serially connected two GANs, one for general labels and the other for detailed labels. GDGAN produced diverse medical images, and the network trained with an augmented dataset outperformed other networks using existing methods with respect to Area-Under-Curve (AUC) of Receiver Operating Characteristic (ROC) curve.
Deep Ensemble Tensor Factorization for Longitudinal Patient Trajectories Classification
De Brouwer, Edward, Simm, Jaak, Arany, Adam, Moreau, Yves
We present a generative approach to classify scarcely observed longitudinal patient trajectories. The available time series are represented as tensors and factorized using generative deep recurrent neural networks. The learned factors represent the patient data in a compact way and can then be used in a downstream classification task. For more robustness and accuracy in the predictions, we used an ensemble of those deep generative models to mimic Bayesian posterior sampling. We illustrate the performance of our architecture on an intensive-care case study of in-hospital mortality prediction with 96 longitudinal measurement types measured across the first 48-hour from admission. Our combination of generative and ensemble strategies achieves an AUC of over 0.85, and outperforms the SAPS-II mortality score and GRU baselines.
Beta Distribution Drift Detection for Adaptive Classifiers
Fleckenstein, Lukas, Kauschke, Sebastian, Fรผrnkranz, Johannes
With today's abundant streams of data, the only constant we can rely on is change. For stream classification algorithms, it is necessary to adapt to concept drift. This can be achieved by monitoring the model error, and triggering counter measures as changes occur. In this paper, we propose a drift detection mechanism that fits a beta distribution to the model error, and treats abnormal behavior as drift. It works with any given model, leverages prior knowledge about this model, and allows to set application-specific confidence thresholds. Experiments confirm that it performs well, in particular when drift occurs abruptly.
Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data
The typical pipeline for a supervised machine learning project involves firstly the collection of a significant sample of labelled examples typically referred to as training data. Depending on whether the labels are continuous or categorical, the supervised learning task is known as regression or classification respectively. Next, once the training data is sufficiently clean and complete, it is used to directly build a predictive model using the machine learning algorithm of choice. The predictive model is then used to label new unlabelled examples, and if the labels of the new examples are known a priori by the user (but not used by the learning algorithm) then the predictive accuracy of the model can be evaluated. Different models can therefore be directly compared. In the usual case, the training data is "real", i.e. the model is learned directly from labelled examples that were collected specifically for that purpose. However, quite frequently, modifications are made to the training data after it is collected. For example, it is standard practice to remove outlier examples and normalise numeric values. Moreover, the machine learning algorithm itself may specify modifications to the training data.
How Can We Use Artificial Intelligence To Prevent Crime?
To many, the use of Artificial Intelligence to prevent crimes and aid in sentencing criminals may seem like a scene from a science fiction movie. However, police departments in the United Kingdom - including Durham, Kent, and South Wales - are already using facial recognition and behavioural software to prevent a crime before it occurs. Computer-driven evaluation frameworks are being used to inform custodial and sentencing decisions. The technology offers both huge promise and the prospect of dark dystopia in seemingly equal measure. This raises several ethical questions.