Genre
A Multi-Engine Approach to Answer Set Programming
Maratea, Marco, Pulina, Luca, Ricca, Francesco
Answer Set Programming (ASP) is a truly-declarative programming paradigm proposed in the area of non-monotonic reasoning and logic programming, that has been recently employed in many applications. The development of efficient ASP systems is, thus, crucial. Having in mind the task of improving the solving methods for ASP, there are two usual ways to reach this goal: $(i)$ extending state-of-the-art techniques and ASP solvers, or $(ii)$ designing a new ASP solver from scratch. An alternative to these trends is to build on top of state-of-the-art solvers, and to apply machine learning techniques for choosing automatically the "best" available solver on a per-instance basis. In this paper we pursue this latter direction. We first define a set of cheap-to-compute syntactic features that characterize several aspects of ASP programs. Then, we apply classification methods that, given the features of the instances in a {\sl training} set and the solvers' performance on these instances, inductively learn algorithm selection strategies to be applied to a {\sl test} set. We report the results of a number of experiments considering solvers and different training and test sets of instances taken from the ones submitted to the "System Track" of the 3rd ASP Competition. Our analysis shows that, by applying machine learning techniques to ASP solving, it is possible to obtain very robust performance: our approach can solve more instances compared with any solver that entered the 3rd ASP Competition. (To appear in Theory and Practice of Logic Programming (TPLP).)
Failure of Calibration is Typical
Schervish (1985b) showed that every forecasting system is noncalibrated for uncountably many data sequences that it might see. This result is strengthened here: from a topological point of view, failure of calibration is typical and calibration rare. Meanwhile, Bayesian forecasters are certain that they are calibrated---this invites worries about the connection between Bayesianism and rationality.
Regularization and nonlinearities for neural language models: when are they needed?
Pachitariu, Marius, Sahani, Maneesh
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successful word and character-level LMs. Why do they work so well, in particular better than linear neural LMs? Possible explanations are that RNNs have an implicitly better regularization or that RNNs have a higher capacity for storing patterns due to their nonlinearities or both. Here we argue for the first explanation in the limit of little training data and the second explanation for large amounts of text data. We show state-of-the-art performance on the popular and small Penn dataset when RNN LMs are regularized with random dropout. Nonetheless, we show even better performance from a simplified, much less expressive linear RNN model without off-diagonal entries in the recurrent matrix. We call this model an impulse-response LM (IRLM). Using random dropout, column normalization and annealed learning rates, IRLMs develop neurons that keep a memory of up to 50 words in the past and achieve a perplexity of 102.5 on the Penn dataset. On two large datasets however, the same regularization methods are unsuccessful for both models and the RNN's expressivity allows it to overtake the IRLM by 10 and 20 percent perplexity, respectively. Despite the perplexity gap, IRLMs still outperform RNNs on the Microsoft Research Sentence Completion (MRSC) task. We develop a slightly modified IRLM that separates long-context units (LCUs) from short-context units and show that the LCUs alone achieve a state-of-the-art performance on the MRSC task of 60.8%. Our analysis indicates that a fruitful direction of research for neural LMs lies in developing more accessible internal representations, and suggests an optimization regime of very high momentum terms for effectively training such models.
Iterative Grassmannian Optimization for Robust Image Alignment
He, Jun, Zhang, Dejiao, Balzano, Laura, Tao, Tao
Robust high-dimensional data processing has witnessed an exciting development in recent years, as theoretical results have shown that it is possible using convex programming to optimize data fit to a low-rank component plus a sparse outlier component. This problem is also known as Robust PCA, and it has found application in many areas of computer vision. In image and video processing and face recognition, the opportunity to process massive image databases is emerging as people upload photo and video data online in unprecedented volumes. However, data quality and consistency is not controlled in any way, and the massiveness of the data poses a serious computational challenge. In this paper we present t-GRASTA, or "Transformed GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm)". t-GRASTA iteratively performs incremental gradient descent constrained to the Grassmann manifold of subspaces in order to simultaneously estimate a decomposition of a collection of images into a low-rank subspace, a sparse part of occlusions and foreground objects, and a transformation such as rotation or translation of the image. We show that t-GRASTA is 4 $\times$ faster than state-of-the-art algorithms, has half the memory requirement, and can achieve alignment for face images as well as jittered camera surveillance images.
Safeguarding E-Commerce against Advisor Cheating Behaviors: Towards More Robust Trust Models for Handling Unfair Ratings
In electronic marketplaces, after each transaction buyers will rate the products provided by the sellers. To decide the most trustworthy sellers to transact with, buyers rely on trust models to leverage these ratings to evaluate the reputation of sellers. Although the high effectiveness of different trust models for handling unfair ratings have been claimed by their designers, recently it is argued that these models are vulnerable to more intelligent attacks, and there is an urgent demand that the robustness of the existing trust models has to be evaluated in a more comprehensive way. In this work, we classify the existing trust models into two broad categories and propose an extendable e-marketplace testbed to evaluate their robustness against different unfair rating attacks comprehensively. On top of highlighting the robustness of the existing trust models for handling unfair ratings is far from what they were claimed to be, we further propose and validate a novel combination mechanism for the existing trust models, Discount-then-Filter, to notably enhance their robustness against the investigated attacks.
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Ateniese, Giuseppe, Felici, Giovanni, Mancini, Luigi V., Spognardi, Angelo, Villani, Antonio, Vitali, Domenico
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights.
A lasso for hierarchical interactions
Bien, Jacob, Taylor, Jonathan, Tibshirani, Robert
We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity - the number of nonzero coefficients - and practical sparsity - the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.
Machine Learning with Operational Costs
Tulabandhula, Theja, Rudin, Cynthia
This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models, so as to provide a useful way for practitioners to understand uncertainty. To do this, the operational cost is cast as a regularization term in a learning algorithm's objective function, allowing either an optimistic or pessimistic view of possible costs, depending on the regularization parameter. From another perspective, if we have prior knowledge about the operational cost, for instance that it should be low, this knowledge can help to restrict the hypothesis space, and can help with generalization. We provide a theoretical generalization bound for this scenario. We also show that learning with operational costs is related to robust optimization.
Bioclimating Modelling: A Machine Learning Perspective
Many machine learning (ML) approaches are widely used to generate bioclimatic models for prediction of geographic range of organism as a function of climate. Applications such as prediction of range shift in organism, range of invasive species influenced by climate change are important parameters in understanding the impact of climate change. However, success of machine learning-based approaches depends on a number of factors. While it can be safely said that no particular ML technique can be effective in all applications and success of a technique is predominantly dependent on the application or the type of the problem, it is useful to understand their behaviour to ensure informed choice of techniques. This paper presents a comprehensive review of machine learning-based bioclimatic model generation and analyses the factors influencing success of such models. Considering the wide use of statistical techniques, in our discussion we also include conventional statistical techniques used in bioclimatic modelling.
Group Symmetry and non-Gaussian Covariance Estimation
Soloveychik, Ilya, Wiesel, Ami
We consider robust covariance estimation with group symmetry constraints. Non-Gaussian covariance estimation, e.g., Tyler scatter estimator and Multivariate Generalized Gaussian distribution methods, usually involve non-convex minimization problems. Recently, it was shown that the underlying principle behind their success is an extended form of convexity over the geodesics in the manifold of positive definite matrices. A modern approach to improve estimation accuracy is to exploit prior knowledge via additional constraints, e.g., restricting the attention to specific classes of covariances which adhere to prior symmetry structures. In this paper, we prove that such group symmetry constraints are also geodesically convex and can therefore be incorporated into various non-Gaussian covariance estimators. Practical examples of such sets include: circulant, persymmetric and complex/quaternion proper structures. We provide a simple numerical technique for finding maximum likelihood estimates under such constraints, and demonstrate their performance advantage using synthetic experiments.