Accuracy
r/MachineLearning - [D] Tips on improving random forest predictive accuracy when # of features is really low?
Normally when I do RF projects I use some sort of feature selection method to choose which features to use. Then I fit the RF model onto those features. Then to test accuracy / related metrics I use cross validation, confusion matrices, etc. However in this case I only have two given features. I don't want to just literally run a RF model on those two features as my whole entire project. I'm thinking gradient boosting is what I should learn?
Deep Learning for Automated Classification and Characterization of Amorphous Materials
Swanson, Kirk, Trivedi, Shubhendu, Lequieu, Joshua, Swanson, Kyle, Kondor, Risi
The characterization of amorphous materials is especially challenging because their lack of long-range order makes it difficult to define structural metrics. In this work, we apply deep learning algorithms to accurately classify amorphous materials and characterize their structural features. Specifically, we show that convolutional neural networks and message passing neural networks can classify two-dimensional liquids and liquid-cooled glasses from molecular dynamics simulations with greater than 0.98 AUC, with no a priori assumptions about local particle relationships, even when the liquids and glasses are prepared at the same inherent structure energy. Furthermore, we demonstrate that message passing neural networks surpass convolutional neural networks in this context in both accuracy and interpretability. We extract a clear interpretation of how message passing neural networks evaluate liquid and glass structures by using a self-attention mechanism. Using this interpretation, we derive three novel structural metrics that accurately characterize glass formation. The methods presented here provide us with a procedure to identify important structural features in materials that could be missed by standard techniques and give us a unique insight into how these neural networks process data. I. INTRODUCTION Classifying material structures and predicting their properties are important tasks in materials science. The behavior of materials often depends strongly on their underlying structure, and understanding these structure-property relationships relies on accurately describing the structural features of a material. However, quantifying structure-property relationships and identifying structural features in complex materials are difficult tasks. A variety of standard techniques have been developed to analyze material structures. Some of the most common techniques include the Steinhardt bond order parameters, 1 Bond Angle Analysis (BAA), 2 and Common Neighbor Analysis (CNA), 3 which are useful for detecting order-disorder transitions and differentiating between crystal structures in ordered samples. As discussed in Reinhardt et al., 4 the Steinhardt bond order parameters can be stymied by thermal fluctuations or am-a) Electronic mail: swansonk1@uchicago.edu BAA relies on a small set of crystalline reference structures that may not be present in amorphous samples. CNA is more flexible than BAA, but it cannot provide accurate information about particles that do not exhibit known symmetries, making analysis of irregular structures challenging.
Effectiveness of Adversarial Examples and Defenses for Malware Classification
Podschwadt, Robert, Takabi, Hassan
Artificial neural networks have been successfully used for many different classification tasks including malware detection and distinguishing between malicious and non-malicious programs. Although artificial neural networks perform very well on these tasks, they are also vulnerable to adversarial examples. An adversarial example is a sample that has minor modifications made to it so that the neural network misclassifies it. Many techniques have been proposed, both for crafting adversarial examples and for hardening neural networks against them. Most previous work has been done in the image domain. Some of the attacks have been adopted to work in the malware domain which typically deals with binary feature vectors. In order to better understand the space of adversarial examples in malware classification, we study different approaches of crafting adversarial examples and defense techniques in the malware domain and compare their effectiveness on multiple datasets.
AFP-CKSAAP: Prediction of Antifreeze Proteins Using Composition of k-Spaced Amino Acid Pairs with Deep Neural Network
Antifreeze proteins (AFPs) are the sub-set of ice binding proteins indispensable for the species living in extreme cold weather. These proteins bind to the ice crystals, hindering their growth into large ice lattice that could cause physical damage. There are variety of AFPs found in numerous organisms and due to the heterogeneous sequence characteristics, AFPs are found to demonstrate a high degree of diversity, which makes their prediction a challenging task. Herein, we propose a machine learning framework to deal with this vigorous and diverse prediction problem using the manifolding learning through composition of k-spaced amino acid pairs. We propose to use the deep neural network with skipped connection and ReLU non-linearity to learn the non-linear mapping of protein sequence descriptor and class label. The proposed antifreeze protein prediction method called AFP-CKSAAP has shown to outperform the contemporary methods, achieving excellent prediction scores on standard dataset. The main evaluater for the performance of the proposed method in this study is Youden's index whose high value is dependent on both sensitivity and specificity. In particular, AFP-CKSAAP yields a Youden's index value of 0.82 on the independent dataset, which is better than previous methods.
Anomaly Detection with Inexact Labels
Iwata, Tomoharu, Toyoda, Machiko, Tora, Shotaro, Ueda, Naonori
Tomoharu Iwata 1 Machiko Toyoda 2 Shotaro Tora 2 Naonori Ueda 1 1 NTT Communication Science Laboratories 2 NTT Software Innovation Center Abstract We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which is our extension of the area under the ROC curve (AUC) for inexact labels. The proposed method trains an anomaly score function so that the smooth approximation of the inexact AUC increases while anomaly scores for non-anomalous instances become low. The proposed method performs well even when only a small number of inexact labels are available by incorporating an unsupervised anomaly detection mechanism with inexact AUC maximization. Using various datasets, we experimentally demonstrate that our proposed method improves the anomaly detection performance with inexact anomaly labels, and outperforms existing unsupervised and supervised anomaly detection and multiple instance learning methods. 1 Introduction Anomaly detection is an important machine learning task, which is a task to find the anomalous instances in a dataset. Many unsupervised anomaly detection methods have been proposed (Breunig et al., 2000; Sch olkopf et al., 2001; Liu et al., 2008; Sakurada and Yairi, 2014).
Classifying the Valence of Autobiographical Memories from fMRI Data
Frid, Alex, Manevitz, Larry M., Nawa, Norberto Eiji
We show that fMRI analysis using machine learning tools are sufficient to distinguish valence (i.e., positive or negative) of freely retrieved autobiographical memories in a cross - participant setting. Our methodology uses feature selection (ReliefF) in combination with boosting methods, both applied directly to data represented in voxel space . In previous work using the same data set, Nawa and Ando showed that whole - brain based classification could achi eve above - chance classification accuracy only when both training and testing data c a me from the same individual . I n a cross - participant setting, classification results were not statistically significant . Additionally, on average the classification accuracy obtained when using ReliefF is substantially higher than previous results - 81 % for the within - participant classification, and 62 % for the cross - participant classification . Furthermore, s ince features are defined in voxel space, it is possible to show brain maps indicating the regions of that are most relevant in determining the results of the classification . Interestingly, the voxels that were selected using the proposed computational pipeline seem to be consistent with current neurophysiological theories regarding the brain regions actively involved in autobiographical memor y processes .
The Prevalence of Errors in Machine Learning Experiments
Shepperd, Martin, Guo, Yuchen, Li, Ning, Arzoky, Mahir, Capiluppi, Andrea, Counsell, Steve, Destefanis, Giuseppe, Swift, Stephen, Tucker, Allan, Yousefi, Leila
Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments.
Self-paced Ensemble for Highly Imbalanced Massive Data Classification
Liu, Zhining, Cao, Wei, Gao, Zhifeng, Bian, Jiang, Chen, Hechang, Chang, Yi, Liu, Tie-Yan
--Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario. T o tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers. T aking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling. Extensive experiments have shown that this new framework, while being very computationally efficient, can lead to robust performance even under highly overlapping classes and extremely skewed distribution. Note that, our methods can be easily adapted to most of existing learning methods (e.g., C4.5, SVM, GBDT and Neural Network) to boost their performance on imbalanced data. I NTRODUCTION The development of information technology brings the explosion of massive data in our daily life. However, many real applications usually generate very imbalanced datasets for corresponding key classification tasks. For instance, online advertising services can give rise to a high amount of datasets, consisting of user views or clicks on ads, for the task of click-through rate prediction [1]. Commonly, user clicks only constitute a small rate of user behaviors . For another example, credit fraud detection [2] relies on the dataset containing massive real credit card transactions where only a small proportion are frauds. Similar situations also exist in the tasks of medical diagnosis, record linkage and network intrusion detection etc [3]-[5]. In addition, real-world datasets are likely to contain other difficulty factors, including noises and missing values. Such highly imbalanced, large-scale and noisy data brings serious challenges of downstream classification tasks.
Evaluating machine learning models: How to tackle metrics - codecentric AG Blog
Once a model has been trained, it can be evaluated in different ways and with more or less complex and meaningful procedures and metrics. However, the number and possible criteria for evaluating machine learning models can initially be quite confusing to someone who is just starting to deal with the field of machine learning. For example, it depends on whether the learning is un-supervised or supervised. In the case of supervised learning it also depends on whether we are dealing with a regression or classification, the underlying use case, and so on – to name just a few criteria. I would like to start with supervised learning and classification.
Swapped Face Detection using Deep Learning and Subjective Assessment
Ding, Xinyi, Raziei, Zohreh, Larson, Eric C., Olinick, Eli V., Krueger, Paul, Hahsler, Michael
--The tremendous success of deep learning for imaging applications has resulted in numerous beneficial advances. Unfortunately, this success has also been a catalyst for malicious uses such as photo-realistic face swapping of parties without consent. Transferring one person's face from a source image to a target image of another person, while keeping the image photo-realistic overall has become increasingly easy and automatic, even for individuals without much knowledge of image processing. In this study, we use deep transfer learning for face swapping detection, showing true positive rates 96% with very few false alarms. Distinguished from existing methods that only provide detection accuracy, we also provide uncertainty for each prediction, which is critical for trust in the deployment of such detection systems. Moreover, we provide a comparison to human subjects. T o capture human recognition performance, we build a website to collect pairwise comparisons of images from human subjects. Based on these comparisons, images are ranked from most real to most fake. We compare this ranking to the outputs from our automatic model, showing good, but imperfect, correspondence with linear correlations 0 . Overall, the results show the effectiveness of our method. As part of this study, we create a novel, publicly available dataset that is, to the best of our knowledge, the largest public swapped face dataset created using still images. Our goal of this study is to inspire more research in the field of image forensics through the creation of a public dataset and initial analysis. Face swapping refers to the process of transferring one person's face from a source image to another person in a target image, while maintaining photo-realism.