Accuracy
PAC-Bayesian AUC classification and scoring
Ridgway, James, Alquier, Pierre, Chopin, Nicolas, Liang, Feng
We develop a scoring and classification procedure based on the PAC-Bayesian approach and the AUC (Area Under Curve) criterion. We focus initially on the class of linear score functions. We derive PAC-Bayesian non-asymptotic bounds for two types of prior for the score parameters: a Gaussian prior, and a spike-and-slab prior; the latter makes it possible to perform feature selection. One important advantage of our approach is that it is amenable to powerful Bayesian computational tools. We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method. We also extend our method to a class of non-linear score functions, essentially leading to a nonparametric procedure, by considering a Gaussian process prior.
From MAP to Marginals: Variational Inference in Bayesian Submodular Models
Djolonga, Josip, Krause, Andreas
Submodular optimization has found many applications in machine learning and beyond. We carry out the first systematic investigation of inference in probabilistic models defined through submodular functions, generalizing regular pairwise MRFs and Determinantal Point Processes. In particular, we present L-Field, a variational approach to general log-submodular and log-supermodular distributions based on sub- and supergradients. We obtain both lower and upper bounds on the log-partition function, which enables us to compute probability intervals for marginals, conditionals and marginal likelihoods. We also obtain fully factorized approximate posteriors, at the same computational cost as ordinary submodular optimization. Our framework results in convex problems for optimizing over differentials of submodular functions, which we show how to optimally solve. We provide theoretical guarantees of the approximation quality with respect to the curvature of the function. We further establish natural relations between our variational approach and the classical mean-field method. Lastly, we empirically demonstrate the accuracy of our inference scheme on several submodular models.
Quantized Kernel Learning for Feature Matching
Qin, Danfeng, Chen, Xuanli, Guillaumin, Matthieu, Gool, Luc V.
Matching local visual features is a crucial problem in computer vision and its accuracy greatly depends on the choice of similarity measure. As it is generally very difficult to design by hand a similarity or a kernel perfectly adapted to the data of interest, learning it automatically with as few assumptions as possible is preferable. However, available techniques for kernel learning suffer from several limitations, such as restrictive parametrization or scalability. In this paper, we introduce a simple and flexible family of non-linear kernels which we refer to as Quantized Kernels (QK). QKs are arbitrary kernels in the index space of a data quantizer, i.e., piecewise constant similarities in the original feature space. Quantization allows to compress features and keep the learning tractable. As a result, we obtain state-of-the-art matching performance on a standard benchmark dataset with just a few bits to represent each feature dimension. QKs also have explicit non-linear, low-dimensional feature mappings that grant access to Euclidean geometry for uncompressed features.
A DDoS-Aware IDS Model Based on Danger Theory and Mobile Agents
Zamani, Mahdi, Movahedi, Mahnush, Ebadzadeh, Mohammad, Pedram, Hossein
We propose an artificial immune model for intrusion detection in distributed systems based on a relatively recent theory in immunology called Danger theory. Based on Danger theory, immune response in natural systems is a result of sensing corruption as well as sensing unknown substances. In contrast, traditional self-nonself discrimination theory states that immune response is only initiated by sensing nonself (unknown) patterns. Danger theory solves many problems that could only be partially explained by the traditional model. Although the traditional model is simpler, such problems result in high false positive rates in immune-inspired intrusion detection systems. We believe using danger theory in a multi-agent environment that computationally emulates the behavior of natural immune systems is effective in reducing false positive rates. We first describe a simplified scenario of immune response in natural systems based on danger theory and then, convert it to a computational model as a network protocol. In our protocol, we define several immune signals and model cell signaling via message passing between agents that emulate cells. Most messages include application-specific patterns that must be meaningfully extracted from various system properties. We show how to model these messages in practice by performing a case study on the problem of detecting distributed denial-of-service attacks in wireless sensor networks. We conduct a set of systematic experiments to find a set of performance metrics that can accurately distinguish malicious patterns. The results indicate that the system can be efficiently used to detect malicious patterns with a high level of accuracy.
Locally Weighted Learning for Naive Bayes Classifier
As a consequence of the strong and usually violated conditional independence assumption (CIA) of naive Bayes (NB) classifier, the performance of NB becomes less and less favorable compared to sophisticated classifiers when the sample size increases. We learn from this phenomenon that when the size of the training data is large, we should either relax the assumption or apply NB to a "reduced" data set, say for example use NB as a local model. The latter approach trades the ignored information for the robustness to the model assumption. In this paper, we consider using NB as a model for locally weighted data. A special weighting function is designed so that if CIA holds for the unweighted data, it also holds for the weighted data. The new method is intuitive and capable of handling class imbalance. It is theoretically more sound than the locally weighted learners of naive Bayes that base classification only on the $k$ nearest neighbors. Empirical study shows that the new method with appropriate choice of parameter outperforms seven existing classifiers of similar nature.
From dependency to causality: a machine learning approach
Bontempi, Gianluca, Flauder, Maxime
The relationship between statistical dependency and causality lies at the heart of all statistical approaches to causal inference and can be summarized by two famous statements: correlation (or more generally statistical association) does not imply causation and causation induces a statistical dependency between causes and effects (or more generally descendants) ([26]). In other terms it is well known that statistical dependency is a necessary yet not sufficient condition for causality. The unidirectional link between these 1 two notions has been used by many formal approaches to causality to justify the adoption of statistical methods for detecting or inferring causal links from observational data. The most influential one is the Causal Bayesian Network approach, detailed in ([17]) which relies on notions of independence and conditional independence to detect causal patterns in the data. Well known examples of related inference algorithms are the constraint-based methods like the PC algorithms ([30]) and IC ([23]). These approaches are founded on probability theory and have been shown to be accurate in reconstructing causal patterns in many applications.
Surpassing Human-Level Face Verification Performance on LFW with GaussianFace
Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically complex face variations. This paper proposes a principled multi-task learning approach based on Discriminative Gaussian Process Latent Variable Model, named GaussianFace, to enrich the diversity of training data. In comparison to existing methods, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification in an unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. Extensive experiments demonstrate the effectiveness of the proposed model in learning from diverse data sources and generalize to unseen domain. Specifically, the accuracy of our algorithm achieves an impressive accuracy rate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark [23]. For the first time, the human-level performance in face verification (97.53%) [28] on LFW is surpassed.
Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification
De Santis, Enrico, Livi, Lorenzo, Sadeghian, Alireza, Rizzi, Antonello
Detecting faults in electrical power grids is of paramount importance, either from the electricity operator and consumer viewpoints. Modern electric power grids (smart grids) are equipped with smart sensors that allow to gather real-time information regarding the physical status of all the component elements belonging to the whole infrastructure (e.g., cables and related insulation, transformers, breakers and so on). In real-world smart grid systems, usually, additional information that are related to the operational status of the grid itself are collected such as meteorological information. Designing a suitable recognition (discrimination) model of faults in a real-world smart grid system is hence a challenging task. This follows from the heterogeneity of the information that actually determine a typical fault condition. The second point is that, for synthesizing a recognition model, in practice only the conditions of observed faults are usually meaningful. Therefore, a suitable recognition model should be synthesized by making use of the observed fault conditions only. In this paper, we deal with the problem of modeling and recognizing faults in a real-world smart grid system, which supplies the entire city of Rome, Italy. Recognition of faults is addressed by following a combined approach of multiple dissimilarity measures customization and one-class classification techniques. We provide here an in-depth study related to the available data and to the models synthesized by the proposed one-class classifier. We offer also a comprehensive analysis of the fault recognition results by exploiting a fuzzy set based reliability decision rule.
Binary Linear Classification and Feature Selection via Generalized Approximate Message Passing
Ziniel, Justin, Schniter, Philip, Sederberg, Per
For the problem of binary linear classification and feature selection, we propose algorithmic approaches to classifier design based on the generalized approximate message passing (GAMP) algorithm, recently proposed in the context of compressive sensing. We are particularly motivated by problems where the number of features greatly exceeds the number of training examples, but where only a few features suffice for accurate classification. We show that sum-product GAMP can be used to (approximately) minimize the classification error rate and max-sum GAMP can be used to minimize a wide variety of regularized loss functions. Furthermore, we describe an expectation-maximization (EM)-based scheme to learn the associated model parameters online, as an alternative to cross-validation, and we show that GAMP's state-evolution framework can be used to accurately predict the misclassification rate. Finally, we present a detailed numerical study to confirm the accuracy, speed, and flexibility afforded by our GAMP-based approaches to binary linear classification and feature selection.
Quantile universal threshold: model selection at the detection edge for high-dimensional linear regression
Diaz-Rodriguez, Jairo, Sardy, Sylvain
To estimate a sparse linear model from data with Gaussian noise, consilience from lasso and compressed sensing literatures is that thresholding estimators like lasso and the Dantzig selector have the ability in some situations to identify with high probability part of the significant covariates asymptotically, and are numerically tractable thanks to convexity. Yet, the selection of a threshold parameter $\lambda$ remains crucial in practice. To that aim we propose Quantile Universal Thresholding, a selection of $\lambda$ at the detection edge. We show with extensive simulations and real data that an excellent compromise between high true positive rate and low false discovery rate is achieved, leading also to good predictive risk.