Computational Learning Theory
SAT Modulo Monotonic Theories
Bayless, Sam (University of British Columbia) | Bayless, Noah (Point Grey Mini Secondary School) | Hoos, Holger H. (University of British Columbia) | Hu, Alan J. (University of British Columbia)
Boolean satisfiability (SAT) solvers have been successfully applied to a wide variety of difficult combinatorial problems. Many further problems can be solved by SAT Modulo Theory (SMT) solvers, which extend SAT solvers to handle additional types of constraints. However, building efficient SMT solvers is often very difficult. In this paper, we define the concept of a Boolean monotonic theory and show how to easily build efficient SMT solvers, including effective theory propagation and clause learning, for such theories. We present examples showing useful constraints that are monotonic, including many graph properties (e.g., shortest paths), and geometric properties (e.g., convex hulls). These constraints arise in problems that are otherwise difficult for SAT solvers to handle, such as procedural content generation. We have implemented several monotonic theory solvers using the techniques we present in this paper and applied these to content generation problems, demonstrating major speed-ups over SAT, SMT, and Answer Set Programming solvers, easily solving instances that were previously out of reach.
Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions
Kasarapu, Parthan, Allison, Lloyd
Mixture modelling involves explaining some observed evidence using a combination of probability distributions. The crux of the problem is the inference of an optimal number of mixture components and their corresponding parameters. This paper discusses unsupervised learning of mixture models using the Bayesian Minimum Message Length (MML) criterion. To demonstrate the effectiveness of search and inference of mixture parameters using the proposed approach, we select two key probability distributions, each handling fundamentally different types of data: the multivariate Gaussian distribution to address mixture modelling of data distributed in Euclidean space, and the multivariate von Mises-Fisher (vMF) distribution to address mixture modelling of directional data distributed on a unit hypersphere. The key contributions of this paper, in addition to the general search and inference methodology, include the derivation of MML expressions for encoding the data using multivariate Gaussian and von Mises-Fisher distributions, and the analytical derivation of the MML estimates of the parameters of the two distributions. Our approach is tested on simulated and real world data sets. For instance, we infer vMF mixtures that concisely explain experimentally determined three-dimensional protein conformations, providing an effective null model description of protein structures that is central to many inference problems in structural bioinformatics. The experimental results demonstrate that the performance of our proposed search and inference method along with the encoding schemes improve on the state of the art mixture modelling techniques.
MACHINE INTELLIGENCE 13
The two outstanding figures in the history of computer science are Alan Turing and John von Neumann, and they shared the view that logic was the key to understanding and automating computation. In particular, it was Turing who gave us in the mid-1930s the fundamental analysis, and the logical definition, of the concept of'computability by machine' and who discovered the surprising and beautiful basic fact that there exist universal machines which by suitable programming can be made to t This essay is an expanded and revised version of one entitled The Role of Logic in Computer Science and Artificial Intelligence, which was completed in January 1992 (and was later published in the Proceedings of the Fifth Generation computer Systems 1992 Conference). Since completing that essay I have had the benefit of extremely helpful discussions on many of the details with Professor Donald Michie and Professor I. J. Good, both of whom knew Turing well during the war years at Bletchley Park. Professor J. A. N. Lee, whose knowledge of the literature and archives of the history of computing is encyclopedic, also provided additional information, some of which is still unpublished. Further light has very recently been shed on the von Neumann side of the story by Norman Macrae's excellent biography John von Neumann (Macrae 1992). Accordingly, it seemed appropriate to undertake a more complete and thorough version of the FGCS'92 essay, focussing somewhat more on the interesting historical and biographical issues. I am grateful to Donald Michie and Stephen Muggleton for inviting me to contribute such a'second edition' to the present volume, and I would also like to thank the Institute for New Computer Technology (ICOT) for kind permission to make use of the FGCS'92 essay in this way. 1 LOGIC, COMPUTERS, TURING, AND VON NEUMANN
The Learnability of Unknown Quantum Measurements
Cheng, Hao-Chung, Hsieh, Min-Hsiu, Yeh, Ping-Cheng
Quantum machine learning has received significant attention in recent years, and promising progress has been made in the development of quantum algorithms to speed up traditional machine learning tasks. In this work, however, we focus on investigating the information-theoretic upper bounds of sample complexity - how many training samples are sufficient to predict the future behaviour of an unknown target function. This kind of problem is, arguably, one of the most fundamental problems in statistical learning theory and the bounds for practical settings can be completely characterised by a simple measure of complexity. Our main result in the paper is that, for learning an unknown quantum measurement, the upper bound, given by the fat-shattering dimension, is linearly proportional to the dimension of the underlying Hilbert space. Learning an unknown quantum state becomes a dual problem to ours, and as a byproduct, we can recover Aaronson's famous result [Proc. R. Soc. A 463:3089-3144 (2007)] solely using a classical machine learning technique. In addition, other famous complexity measures like covering numbers and Rademacher complexities are derived explicitly. We are able to connect measures of sample complexity with various areas in quantum information science, e.g. quantum state/measurement tomography, quantum state discrimination and quantum random access codes, which may be of independent interest. Lastly, with the assistance of general Bloch-sphere representation, we show that learning quantum measurements/states can be mathematically formulated as a neural network. Consequently, classical ML algorithms can be applied to efficiently accomplish the two quantum learning tasks.
Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013)
Claassen, Tom, Mooij, Joris M., Heskes, Tom
This article contains detailed proofs and additional examples related to the UAI-2013 submission `Learning Sparse Causal Models is not NP-hard'. It describes the FCI+ algorithm: a method for sound and complete causal model discovery in the presence of latent confounders and/or selection bias, that has worst case polynomial complexity of order $N^{2(k+1)}$ in the number of independence tests, for sparse graphs over $N$ nodes, bounded by node degree $k$. The algorithm is an adaptation of the well-known FCI algorithm by (Spirtes et al., 2000) that is also sound and complete, but has worst case complexity exponential in $N$.
Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy
Balcan, Maria Florina, Feldman, Vitaly
We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and are differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns (1993). We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of "uncorrelated" noise. The complexity of the resulting algorithms has information-theoretically optimal quadratic dependence on $1/(1-2\eta)$, where $\eta$ is the noise rate. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error $\epsilon$ over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case.
Falsifiable implies Learnable
The paper demonstrates that falsifiability is fundamental to learning. We prove the following theorem for statistical learning and sequential prediction: If a theory is falsifiable then it is learnable -- i.e. admits a strategy that predicts optimally. An analogous result is shown for universal induction.
Non-Restarting SAT Solvers with Simple Preprocessing Can Efficiently Simulate Resolution
Beame, Paul (University of Washington) | Sabharwal, Ashish (IBM Watson Research Center)
Propositional satisfiability (SAT) solvers based on conflict directed clause learning (CDCL) implicitly produce resolution refutations of unsatisfiable formulas. The precise class of formulas for which they can produce polynomial size refutations has been the subject of several studies, with special focus on the clause learning aspect of these solvers. The results, however, assume the use of non-standard and non-asserting learning schemes, or rely on polynomially many restarts for simulating individual steps of a resolution refutation, or work with a theoretical model that significantly deviates from certain key aspects of all modern CDCL solvers such as learning only one asserting clause from each conflict and other techniques such as conflict guided backjumping and phase saving. We study non-restarting CDCL solvers that learn only one asserting clause per conflict and show that, with simple preprocessing that depends only on the number of variables of the input formula, such solvers can polynomially simulate resolution. We show, moreover, that this preprocessing allows one to convert any CDCL solver to one that is non-restarting.
Relaxation Search: A Simple Way of Managing Optional Clauses
Bacchus, Fahiem (University of Toronto) | Davies, Jessica (University of Toronto) | Tsimpoukelli, Maria (University of Toronto) | Katsirelos, George (MIAT, INRA Toulouse)
A number of problems involve managing a set of optional clauses. For example, the soft clauses in a MAXSAT formula are optional—they can be falsified for a cost. Similarly, when computing a Minimum Correction Set for an unsatisfiable formula, all clauses are optional—some can be falsified in order to satisfy the remaining. In both of these cases the task is to find a subset of the optional clauses that achieves some criteria, and whose removal leaves a satisfiable formula. Relaxation search is a simple method of using a standard SAT solver to solve this task. Relaxation search is easy to implement, sometimes requiring only a simple modification of the variable selection heuristic in the SAT solver; it offers considerable flexibility and control over the order in which subsets of optional clauses are examined; and it automatically exploits clause learning to exchange information between the two phases of finding a suitable subset of optional clauses and checking if their removal yields satisfiability. We demonstrate how relaxation search can be used to solve MAXSAT and to compute Minimum Correction Sets. In both cases relaxation search is able to achieve state-of-the-art performance and solve some instances other solvers are not able to solve.
Private Learning and Sanitization: Pure vs. Approximate Differential Privacy
Beimel, Amos, Nissim, Kobbi, Stemmer, Uri
We compare the sample complexity of private learning [Kasiviswanathan et al. 2008] and sanitization~[Blum et al. 2008] under pure $\epsilon$-differential privacy [Dwork et al. TCC 2006] and approximate $(\epsilon,\delta)$-differential privacy [Dwork et al. Eurocrypt 2006]. We show that the sample complexity of these tasks under approximate differential privacy can be significantly lower than that under pure differential privacy. We define a family of optimization problems, which we call Quasi-Concave Promise Problems, that generalizes some of our considered tasks. We observe that a quasi-concave promise problem can be privately approximated using a solution to a smaller instance of a quasi-concave promise problem. This allows us to construct an efficient recursive algorithm solving such problems privately. Specifically, we construct private learners for point functions, threshold functions, and axis-aligned rectangles in high dimension. Similarly, we construct sanitizers for point functions and threshold functions. We also examine the sample complexity of label-private learners, a relaxation of private learning where the learner is required to only protect the privacy of the labels in the sample. We show that the VC dimension completely characterizes the sample complexity of such learners, that is, the sample complexity of learning with label privacy is equal (up to constants) to learning without privacy.