Accuracy
Towards security defect prediction with AI
Sestili, Carson D., Snavely, William S., VanHoudnos, Nathan M.
Abstract--In this study, we investigate the limits of the current state of the art AI system for detecting buffer overflows and compare it with current static analysis tools. To do so, we developed a code generator, sbAbI, capable of producing an arbitrarily large number of code samples of controlled complexity. We found that the static analysis engines we examined have good precision, but poor recall on this dataset, except for a sound static analyzer that has good precision and recall. We found that the state of the art AI system, a memory network modeled after Choi et al. [1], can achieve similar performance to the static analysis engines, but requires an exhaustive amount of training data in order to do so. Our work points towards future approaches that may solve these problems; namely, using representations of code that can capture appropriate scope information and using deep learning methods that are able to perform arithmetic operations. Predicting security defects in source code is of significant national security interest. It is ideal to detect security defects during development, before the code is ever run to expose those defects. The current best methods to find security defects before running code are static analysis tools, a variety of which exist and model software in different ways that are all useful for different kinds of flaws. Developers of static analyzers carefully equip them with rules about program behavior, which are used to reason about the safety of the program if it were to run. However, static analyzers are known to be insufficient at finding flaws. The Juliet Test Suite [2]-[4] is a collection of synthetic code containing intentional security defects across hundreds of vulnerabilities in the Common Weakness Enumeration standard, labeled at the line-of-code level. Even state-ofthe art static analyzers perform poorly at finding the defects in Juliet, issuing too many false positives and also too many false negatives [5]-[8].
Trash or treasure -- how to tell if a classification algorithm is any good
This is the fourth in a series of articles intended to make Machine Learning more approachable to those without technical training. Prior articles introduce the concept of Machine Learning, show how the process of learning works in general, and describe commonly used algorithms. You can start the series here. In this installment of the series, we review some common measures and considerations when assessing the effectiveness and value of a classification algorithm. To get started, let's assume that an algorithm has been developed on a training set and has generalized well.
Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals
Cotter, Andrew, Jiang, Heinrich, Wang, Serena, Narayan, Taman, Gupta, Maya, You, Seungil, Sridharan, Karthik
We show that many machine learning goals, such as improved fairness metrics, can be expressed as constraints on the model's predictions, which we call rate constraints. We study the problem of training non-convex models subject to these rate constraints (or any non-convex and non-differentiable constraints). In the non-convex setting, the standard approach of Lagrange multipliers may fail. Furthermore, if the constraints are non-differentiable, then one cannot optimize the Lagrangian with gradient-based methods. To solve these issues, we introduce the proxy-Lagrangian formulation. This new formulation leads to an algorithm that produces a stochastic classifier by playing a two-player non-zero-sum game solving for what we call a semi-coarse correlated equilibrium, which in turn corresponds to an approximately optimal and feasible solution to the constrained optimization problem. We then give a procedure which shrinks the randomized solution down to one that is a mixture of at most $m+1$ deterministic solutions, given $m$ constraints. This culminates in algorithms that can solve non-convex constrained optimization problems with possibly non-differentiable and non-convex constraints with theoretical guarantees. We provide extensive experimental results enforcing a wide range of policy goals including different fairness metrics, and other goals on accuracy, coverage, recall, and churn.
Ensemble learning with 3D convolutional neural networks for connectome-based prediction
Khosla, Meenakshi, Jamison, Keith, Kuceyeski, Amy, Sabuncu, Mert R.
The specificty and sensitivity of resting state functional MRI (rs-fMRI) measurements depend on pre-processing choices, such as the parcellation scheme used to define regions of interest (ROIs). In this study, we critically evaluate the effect of brain parcellations on machine learning models applied to rs-fMRI data. Our experiments reveal a remarkable trend: On average, models with stochastic parcellations consistently perform as well as models with widely used atlases at the same spatial scale. We thus propose an ensemble learning strategy to combine the predictions from models trained on connectivity data extracted using different (e.g., stochastic) parcellations. We further present an implementation of our ensemble learning strategy with a novel 3D Convolutional Neural Network (CNN) approach. The proposed CNN approach takes advantage of the full-resolution 3D spatial structure of rs-fMRI data and fits non-linear predictive models. Our ensemble CNN framework overcomes the limitations of traditional machine learning models for connectomes that often rely on region-based summary statistics and/or linear models. We showcase our approach on a classification (autism patients versus healthy controls) and a regression problem (prediction of subject's age), and report promising results.
Capsule Deep Neural Network for Recognition of Historical Graffiti Handwriting
Gordienko, Nikita, Kochura, Yuriy, Taran, Vlad, Peng, Gang, Gordienko, Yuri, Stirenko, Sergii
Automatic recognition of the historical letters (XI-XVIII centuries) carved on the stoned walls of St.Sophia cathedral in Kyiv (Ukraine) was demonstrated by means of capsule deep learning neural network. It was applied to the image dataset of the carved Glagolitic and Cyrillic letters (CGCL), which was assembled and pre-processed recently for recognition and prediction by machine learning methods (https://www.kaggle.com/yoctoman/graffiti-st-sophia-cathedral-kyiv). CGCL dataset contains >4000 images for glyphs of 34 letters which are hardly recognized by experts even in contrast to notMNIST dataset with the better images of 10 letters taken from different fonts. Despite the much worse quality of CGCL dataset and extremely low number of samples (in comparison to notMNIST dataset) the capsule network model demonstrated much better results than the previously used convolutional neural network (CNN). The validation accuracy (and validation loss) was higher (lower) for capsule network model than for CNN without data augmentation even. The area under curve (AUC) values for receiver operating characteristic (ROC) were also higher for the capsule network model than for CNN model: 0.88-0.93 (capsule network) and 0.50 (CNN) without data augmentation, 0.91-0.95 (capsule network) and 0.51 (CNN) with lossless data augmentation, and similar results of 0.91-0.93 (capsule network) and 0.9 (CNN) in the regime of lossless data augmentation only. The confusion matrixes were much better for capsule network than for CNN model and gave the much lower type I (false positive) and type II (false negative) values in all three regimes of data augmentation. These results supports the previous claims that capsule-like networks allow to reduce error rates not only on MNIST digit dataset, but on the other notMNIST letter dataset and the more complex CGCL handwriting graffiti letter dataset also.
Endowing Robots with Longer-term Autonomy by Recovering from External Disturbances in Manipulation through Grounded Anomaly Classification and Recovery Policies
Wu, Hongmin, Luo, Shuangqi, Chen, Longxin, Duan, Shuangda, Chumkamon, Sakmongkon, Liu, Dong, Guan, Yisheng, Rojas, Juan
Robot manipulation is increasingly poised to interact with humans in co-shared workspaces. Despite increasingly robust manipulation and control algorithms, failure modes continue to exist whenever models do not capture the dynamics of the unstructured environment. To obtain longer-term horizons in robot automation, robots must develop introspection and recovery abilities. We contribute a set of recovery policies to deal with anomalies produced by external disturbances as well as anomaly classification through the use of non-parametric statistics with memoized variational inference with scalable adaptation. A recovery critic stands atop of a tightly-integrated, graph-based online motion-generation and introspection system that resolves a wide range of anomalous situations. Policies, skills, and introspection models are learned incrementally and contextually in a task. Two task-level recovery policies: re-enactment and adaptation resolve accidental and persistent anomalies respectively. The introspection system uses non-parametric priors along with Markov jump linear systems and memoized variational inference with scalable adaptation to learn a model from the data. Extensive real-robot experimentation with various strenuous anomalous conditions is induced and resolved at different phases of a task and in different combinations. The system executes around-the-clock introspection and recovery and even elicited self-recovery when misclassifications occurred.
Poisoning Attacks to Graph-Based Recommender Systems
Fang, Minghong, Yang, Guolei, Gong, Neil Zhenqiang, Liu, Jia
Recommender system is an important component of many web services to help users locate items that match their interests. Several studies showed that recommender systems are vulnerable to poisoning attacks, in which an attacker injects fake data to a given system such that the system makes recommendations as the attacker desires. However, these poisoning attacks are either agnostic to recommendation algorithms or optimized to recommender systems that are not graph-based. Like association-rule-based and matrix-factorization-based recommender systems, graph-based recommender system is also deployed in practice, e.g., eBay, Huawei App Store. However, how to design optimized poisoning attacks for graph-based recommender systems is still an open problem. In this work, we perform a systematic study on poisoning attacks to graph-based recommender systems. Due to limited resources and to avoid detection, we assume the number of fake users that can be injected into the system is bounded. The key challenge is how to assign rating scores to the fake users such that the target item is recommended to as many normal users as possible. To address the challenge, we formulate the poisoning attacks as an optimization problem, solving which determines the rating scores for the fake users. We also propose techniques to solve the optimization problem. We evaluate our attacks and compare them with existing attacks under white-box (recommendation algorithm and its parameters are known), gray-box (recommendation algorithm is known but its parameters are unknown), and black-box (recommendation algorithm is unknown) settings using two real-world datasets. Our results show that our attack is effective and outperforms existing attacks for graph-based recommender systems. For instance, when 1% fake users are injected, our attack can make a target item recommended to 580 times more normal users in certain scenarios.
A Moral Framework for Understanding of Fair ML through Economic Models of Equality of Opportunity
Heidari, Hoda, Loi, Michele, Gummadi, Krishna P., Krause, Andreas
Equality of opportunity (EOP) is an extensively studied conception of fairness in political philosophy. In this work, we map recently proposed notions of algorithmic fairness to economic models of EOP. We formally show that through our proposed mapping, many existing definition of algorithmic fairness, such as predictive value parity and equality of odds, can be interpreted as special cases of EOP. In this respect, our work serves as a unifying moral framework for understanding existing notions of algorithmic fairness. Most importantly, this framework allows us to explicitly spell out the moral assumptions underlying each notion of fairness, and also interpret recent fairness impossibility results in a new light. Last but not least and inspired by luck egalitarian models of EOP, we propose a new, more general family of measures for algorithmic fairness. We empirically show that employing a measure of algorithmic (un)fairness when its underlying moral assumptions are not satisfied, can have devastating consequences on the subjects' welfare.
Shallow vs deep learning architectures for white matter lesion segmentation in the early stages of multiple sclerosis
La Rosa, Francesco, Fartaria, Mário João, Kober, Tobias, Richiardi, Jonas, Granziera, Cristina, Thiran, Jean-Philippe, Cuadra, Meritxell Bach
In this work, we present a comparison of a shallow and a deep learning architecture for the automated segmentation of white matter lesions in MR images of multiple sclerosis patients. In particular, we train and test both methods on early stage disease patients, to verify their performance in challenging conditions, more similar to a clinical setting than what is typically provided in multiple sclerosis segmentation challenges. Furthermore, we evaluate a prototype naive combination of the two methods, which refines the final segmentation. All methods were trained on 32 patients, and the evaluation was performed on a pure test set of 73 cases. Results show low lesion-wise false positives (30%) for the deep learning architecture, whereas the shallow architecture yields the best Dice coefficient (63%) and volume difference (19%). Combining both shallow and deep architectures further improves the lesion-wise metrics (69% and 26% lesion-wise true and false positive rate, respectively).
Bayesian Patchworks: An Approach to Case-Based Reasoning
Moghaddass, Ramin, Rudin, Cynthia
Doctors often rely on their past experience in order to diagnose patients. For a doctor with enough experience, almost every patient would have similarities to key cases seen in the past, and each new patient could be viewed as a mixture of these key past cases. Because doctors often tend to reason this way, an efficient computationally aided diagnostic tool that thinks in the same way might be helpful in locating key past cases of interest that could assist with diagnosis. This article develops a novel mathematical model to mimic the type of logical thinking that physicians use when considering past cases. The proposed model can also provide physicians with explanations that would be similar to the way they would naturally reason about cases. The proposed method is designed to yield predictive accuracy, computational efficiency, and insight into medical data; the key element is the insight into medical data - in some sense we are automating a complicated process that physicians might perform manually. We finally implemented the result of this work on two publicly available healthcare datasets, for (1) heart disease prediction and (2) breast cancer prediction.