Accuracy
Differentially Private Fair Learning
Jagielski, Matthew, Kearns, Michael, Mao, Jieming, Oprea, Alina, Roth, Aaron, Sharifi-Malvajerdi, Saeed, Ullman, Jonathan
We design two learning algorithms that simultaneously promise differential privacy and equalized odds, a 'fairness' condition that corresponds to equalizing false positive and negative rates across protected groups. Our first algorithm is a simple private implementation of the post-processing approach of [Hardt et al. 2016]. This algorithm has the merit of being exceedingly simple, but must be able to use protected group membership explicitly at test time, which can be viewed as 'disparate treatment'. The second algorithm is a differentially private version of the algorithm of [Agarwal et al. 2018], an oracle-efficient algorithm that can be used to find the optimal fair classifier, given access to a subroutine that can solve the original (not necessarily fair) learning problem. This algorithm need not have access to protected group membership at test time. We identify new tradeoffs between fairness, accuracy, and privacy that emerge only when requiring all three properties, and show that these tradeoffs can be milder if group membership may be used at test time.
Cyber Anomaly Detection Using Graph-node Role-dynamics
Palladino, Anthony, Thissen, Christopher J.
Intrusion detection systems (IDSs) generate valuable knowledge about network security, but an abundance of false alarms and a lack of methods to capture the interdependence among alerts hampers their utility for network defense. Here, we explore a graph-based approach for fusing alerts generated by multiple IDSs (e.g., Snort, OSSEC, and Bro). Our approach generates a weighted graph of alert fields (not network topology) that makes explicit the connections between multiple alerts, IDS systems, and other cyber artifacts. We use this multi-modal graph to identify anomalous changes in the alert patterns of a network. To detect the anomalies, we apply the role-dynamics approach, which has successfully identified anomalies in social media, email, and IP communication graphs. In the cyber domain, each node (alert field) in the fused IDS alert graph is assigned a probability distribution across a small set of roles based on that node's features. A cyber attack should trigger IDS alerts and cause changes in the node features, but rather than track every feature for every alert-field node individually, roles provide a succinct, integrated summary of those feature changes. We measure changes in each node's probabilistic role assignment over time, and identify anomalies as deviations from expected roles. We test our approach using simulations including three weeks of normal background traffic, as well as cyber attacks that occur near the end of the simulations. This paper presents a novel approach to multi-modal data fusion and a novel application of role dynamics within the cyber-security domain. Our results show a drastic decrease in the false-positive rate when considering our anomaly indicator instead of the IDS alerts themselves, thereby reducing alarm fatigue and providing a promising avenue for threat intelligence in network defense.
Progressive Sampling-Based Bayesian Optimization for Efficient and Automatic Machine Learning Model Selection
Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. Methods: To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. Results: We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. Conclusions: This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance
Denouden, Taylor, Salay, Rick, Czarnecki, Krzysztof, Abdelzad, Vahdat, Phan, Buu, Vernekar, Sachin
There is an increasingly apparent need for validating the classifications made by deep learning systems in safety-critical applications like autonomous vehicle systems. A number of recent papers have proposed methods for detecting anomalous image data that appear different from known inlier data samples, including reconstruction-based autoencoders. Autoencoders optimize the compression of input data to a latent space of a dimensionality smaller than the original input and attempt to accurately reconstruct the input using that compressed representation. Since the latent vector is optimized to capture the salient features from the inlier class only, it is commonly assumed that images of objects from outside of the training class cannot effectively be compressed and reconstructed. Some thus consider reconstruction error as a kind of novelty measure. Here we suggest that reconstruction-based approaches fail to capture particular anomalies that lie far from known inlier samples in latent space but near the latent dimension manifold defined by the parameters of the model. We propose incorporating the Mahalanobis distance in latent space to better capture these out-of-distribution samples and our results show that this method often improves performance over the baseline approach.
Prior Networks for Detection of Adversarial Attacks
Adversarial examples are considered a serious issue for safety critical applications of AI, such as finance, autonomous vehicle control and medicinal applications. Though significant work has resulted in increased robustness of systems to these attacks, systems are still vulnerable to well-crafted attacks. To address this problem, several adversarial attack detection methods have been proposed. However, a system can still be vulnerable to adversarial samples that are designed to specifically evade these detection methods. One recent detection scheme that has shown good performance is based on uncertainty estimates derived from Monte-Carlo dropout ensembles. Prior Networks, a new method of estimating predictive uncertainty, has been shown to outperform Monte-Carlo dropout on a range of tasks. One of the advantages of this approach is that the behaviour of a Prior Network can be explicitly tuned to, for example, predict high uncertainty in regions where there are no training data samples. In this work, Prior Networks are applied to adversarial attack detection using measures of uncertainty in a similar fashion to Monte-Carlo Dropout. Detection based on measures of uncertainty derived from DNNs and Monte-Carlo dropout ensembles are used as a baseline. Prior Networks are shown to significantly out-perform these baseline approaches over a range of adversarial attacks in both detection of whitebox and blackbox configurations. Even when the adversarial attacks are constructed with full knowledge of the detection mechanism, it is shown to be highly challenging to successfully generate an adversarial sample.
A two-stage hybrid model by using artificial neural networks as feature construction algorithms
Wang, Yan, Ni, Xuelei Sherry, Stone, Brian
We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under ROC curve, and KS statistic. By creating new features with the neural network technique, the underlying nonlinear relationships between variables are identified. Furthermore, by using a very simple neural network structure, the model could overcome the drawbacks of neural networks in terms of its long training time, complex topology, and limited interpretability.
An empirical study on hyperparameter tuning of decision trees
Mantovani, Rafael Gomes, Horvรกth, Tomรกลก, Cerri, Ricardo, Junior, Sylvio Barbon, Vanschoren, Joaquin, de Carvalho, Andrรฉ Carlos Ponce de Leon Ferreira
Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations, and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive accuracy. However, we lack insight into how to efficiently explore this vast space of configurations: which are the best optimization techniques, how should we use them, and how significant is their effect on predictive or runtime performance? This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning on three Decision Tree induction algorithms, CART, C4.5 and CTree. These algorithms were selected because they are based on similar principles, have presented a high predictive performance in several previous works and induce interpretable classification models. Additionally, they contain many interacting hyperparameters to be adjusted. Experiments were carried out with different tuning strategies to induce models and evaluate the relevance of hyperparameters using 94 classification datasets from OpenML. Experimental results indicate that hyperparameter tuning provides statistically significant improvements for C4.5 and CTree in only one-third of the datasets, and in most of the datasets for CART. Different tree algorithms may present different tuning scenarios, but in general, the tuning techniques required relatively few iterations to find accurate solutions. Furthermore, the best technique for all the algorithms was the Irace. Finally, we find that tuning a specific small subset of hyperparameters contributes most of the achievable optimal predictive performance.
Utilizing Imbalanced Data and Classification Cost Matrix to Predict Movie Preferences
In this paper, we propose a movie genre recommendation system based on imbalanced survey data and unequal classification costs for small and medium-sized enterprises (SMEs) who need a data-based and analytical approach to stock favored movies and target marketing to young people. The dataset maintains a detailed personal profile as predictors including demographic, behavioral and preferences information for each user as well as imbalanced genre preferences. These predictors do not include movies' information such as actors or directors. The paper applies Gentle boost, Adaboost and Bagged tree ensembles as well as SVM machine learning algorithms to learn classification from one thousand observations and predict movie genre preferences with adjusted classification costs. The proposed recommendation system also selects important predictors to avoid overfitting and to shorten training time. This paper compares the test error among the above-mentioned algorithms that are used to recommend different movie genres. The prediction power is also indicated in a comparison of precision and recall with other state-of-the-art recommendation systems. The proposed movie genre recommendation system solves problems such as small dataset, imbalanced response, and unequal classification costs.
Using Multitask Learning to Improve 12-Lead Electrocardiogram Classification
Hughes, J. Weston, Sittler, Taylor, Joseph, Anthony D., Olgin, Jeffrey E., Gonzalez, Joseph E., Tison, Geoffrey H.
We develop a multi-task convolutional neural network (CNN) to classify multiple diagnoses from 12-lead electrocardiograms (ECGs) using a dataset comprised of over 40,000 ECGs, with labels derived from cardiologist clinical interpretations. Since many clinically important classes can occur in low frequencies, approaches are needed to improve performance on rare classes. We compare the performance of several single-class classifiers on rare classes to a multi-headed classifier across all available classes. We demonstrate that the addition of common classes can significantly improve CNN performance on rarer classes when compared to a model trained on the rarer class in isolation. Using this method, we develop a model with high performance as measured by F1 score on multiple clinically relevant classes compared against the gold-standard cardiologist interpretation.
LSCP: Locally Selective Combination in Parallel Outlier Ensembles
Zhao, Yue, Hryniewicki, Maciej K., Nasrullah, Zain, Li, Zheng
In unsupervised outlier ensembles, the absence of ground truth makes the combination of base detectors a challenging task. Specifically, existing parallel outlier ensembles lack a reliable way of selecting competent base detectors, affecting accuracy and stability, during model combination. In this paper, we propose a framework---called Locally Selective Combination in Parallel Outlier Ensembles (LSCP)---which addresses this issue by defining a local region around a test instance using the consensus of its nearest neighbors in randomly generated feature spaces. The top-performing base detectors in this local region are selected and combined as the model's final output. Four variants of the LSCP framework are compared with six widely used combination algorithms for parallel ensembles. Experimental results demonstrate that one of these LSCP variants consistently outperforms baseline algorithms on the majority of eighteen real-world datasets.