Accuracy
Artificial intelligence cuts lung cancer screening false positives
PITTSBURGH, March 12, 2019 - Lung cancer is the leading cause of cancer deaths worldwide. Screening is key for early detection and increased survival, but the current method has a 96 percent false positive rate. Using machine learning, researchers at the University of Pittsburgh and UPMC Hillman Cancer Center have found a way to substantially reduce false positives without missing a single case of cancer. The study was published today in the journal Thorax. This is the first time artificial intelligence has been applied to the question of sorting out benign from cancerous nodules in lung cancer screening.
ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation
Rudd, Ethan M., Ducau, Felipe N., Wild, Cody, Berlin, Konstantin, Harang, Richard
Malware detection is a popular application of Machine Learning for Information Security (ML-Sec), in which an ML classifier is trained to predict whether a given file is malware or benignware. Parameters of this classifier are typically optimized such that outputs from the model over a set of input samples most closely match the samples' true malicious/benign (1/0) target labels. However, there are often a number of other sources of contextual metadata for each malware sample, beyond an aggregate malicious/benign label, including multiple labeling sources and malware type information (e.g., ransomware, trojan, etc.), which we can feed to the classifier as auxiliary prediction targets. In this work, we fit deep neural networks to multiple additional targets derived from metadata in a threat intelligence feed for Portable Executable (PE) malware and benignware, including a multi-source malicious/benign loss, a count loss on multi-source detections, and a semantic malware attribute tag loss. We find that incorporating multiple auxiliary loss terms yields a marked improvement in performance on the main detection task. We also demonstrate that these gains likely stem from a more informed neural network representation and are not due to a regularization artifact of multi-target learning. Our auxiliary loss architecture yields a significant reduction in detection error rate (false negatives) of 42.6% at a false positive rate (FPR) of $10^{-3}$ when compared to a similar model with only one target, and a decrease of 53.8% at $10^{-5}$ FPR.
Adversarial attacks against Fact Extraction and VERification
Thorne, James, Vlachos, Andreas
This paper describes a baseline for the second iteration of the Fact Extraction and VERification shared task (FEVER2.0) which explores the resilience of systems through adversarial evaluation. We present a collection of simple adversarial attacks against systems that participated in the first FEVER shared task. FEVER modeled the assessment of truthfulness of written claims as a joint information retrieval and natural language inference task using evidence from Wikipedia. A large number of participants made use of deep neural networks in their submissions to the shared task. The extent as to whether such models understand language has been the subject of a number of recent investigations and discussion in literature. In this paper, we present a simple method of generating entailment-preserving and entailment-altering perturbations of instances by common patterns within the training data. We find that a number of systems are greatly affected with absolute losses in classification accuracy of up to $29\%$ on the newly perturbed instances. Using these newly generated instances, we construct a sample submission for the FEVER2.0 shared task. Addressing these types of attacks will aid in building more robust fact-checking models, as well as suggest directions to expand the datasets.
Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms
We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
Learning Data Science through Fun Demonstrations! - Blogs by Nidhi
As a part of a '1 day in Python' workshop, the capabilities of this versatile language were showcased with cases and demonstrations. We realized the underlying logic of the various data science algorithms through these demonstrations; or, to put it in other words โ We got an insight into how computers think! Natural Language Processing (NLP) is concerned with programming computers to process and analyze large amounts of natural language data. These find implementations in: Search engines, Social website feeds, Speech engines and Spam filters. We were given a mixture of words.
Testing Conditional Independence on Discrete Data using Stochastic Complexity
Marx, Alexander, Vreeken, Jilles
Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.
Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning
Li, Liam, Sparks, Evan, Jamieson, Kevin, Talwalkar, Ameet
Hyperparameter tuning of multistage pipelines introduces a significant computational burden. Motivated by the observation that work can be reused across pipelines if the intermediate computations are the same, we propose a pipeline-aware approach to hyperparameter tuning. Our approach optimizes both the design and execution of pipelines to maximize reuse. We design pipelines amenable for reuse by (i) introducing a novel hybrid hyperparameter tuning method called gridded random search, and (ii) reducing the average training time in pipelines by adapting early-stopping hyperparameter tuning approaches. We then realize the potential for reuse during execution by introducing a novel caching problem for ML workloads which we pose as a mixed integer linear program (ILP), and subsequently evaluating various caching heuristics relative to the optimal solution of the ILP. We conduct experiments on simulated and real-world machine learning pipelines to show that a pipeline-aware approach to hyperparameter tuning can offer over an order-of-magnitude speedup over independently evaluating pipeline configurations. Modern machine learning workflows combine multiple stages of data-preprocessing, feature extraction, and supervised and unsupervised learning (Sรกnchez et al., 2013; The methods in each of these stages typically have configuration parameters, or hyperparameters, that influence their output and ultimately predictive accuracy.
Detection of LDDoS Attacks Based on TCP Connection Parameters
Siracusano, Michael, Shiaeles, Stavros, Ghita, Bogdan
Low-rate application layer distributed denial of service (LDDoS) attacks are both powerful and stealthy. They force vulnerable webservers to open all available connections to the adversary, denying resources to real users. Mitigation advice focuses on solutions that potentially degrade quality of service for legitimate connections. Furthermore, without accurate detection mechanisms, distributed attacks can bypass these defences. A methodology for detection of LDDoS attacks, based on characteristics of malicious TCP flows, is proposed within this paper. Research will be conducted using combinations of two datasets: one generated from a simulated network, the other from the publically available CIC DoS dataset. Both contain the attacks slowread, slowheaders and slowbody, alongside legitimate web browsing. TCP flow features are extracted from all connections. Experimentation was carried out using six supervised AI algorithms to categorise attack from legitimate flows. Decision trees and k-NN accurately classified up to 99.99% of flows, with exceptionally low false positive and false negative rates, demonstrating the potential of AI in LDDoS detection.
An Exponential Efron-Stein Inequality for Lq Stable Learning Rules
Abou-Moustafa, Karim, Szepesvari, Csaba
There is accumulating evidence in the literature that stability of learning algorithms is a key characteristic that permits a learning algorithm to generalize. Despite various insightful results in this direction, there seems to be an overlooked dichotomy in the type of stability-based generalization bounds we have in the literature. On one hand, the literature seems to suggest that exponential generalization bounds for the estimated risk, which are optimal, can be only obtained through stringent, distribution independent and computationally intractable notions of stability such as uniform stability. On the other hand, it seems that weaker notions of stability such as hypothesis stability, although it is distribution dependent and more amenable to computation, can only yield polynomial generalization bounds for the estimated risk, which are suboptimal. In this paper, we address the gap between these two regimes of results. In particular, the main question we address here is whether it is possible to derive exponential generalization bounds for the estimated risk using a notion of stability that is computationally tractable and distribution dependent, but weaker than uniform stability. Using recent advances in concentration inequalities, and using a notion of stability that is weaker than uniform stability but distribution dependent and amenable to computation, we derive an exponential tail bound for the concentration of the estimated risk of a hypothesis returned by a general learning rule, where the estimated risk is expressed in terms of either the resubstitution estimate (empirical error), or the deleted (or, leave-one-out) estimate. As an illustration, we derive exponential tail bounds for ridge regression with unbounded responses -- a setting where uniform stability results of Bousquet and Elisseeff (2002) are not applicable.
Generalized Sparse Additive Models
Haris, Asad, Simon, Noah, Shojaie, Ali
We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework.