Goto

Collaborating Authors

 Computational Learning Theory


Actively Avoiding Nonsense in Generative Models

arXiv.org Machine Learning

A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to "model error," i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. The goal is to maximize the likelihood of the positive examples subject to the constraint of (almost) never generating examples labeled invalid by the oracle. Guarantees are agnostic compared to a class of probability distributions. We show that, while proper learning often requires exponentially many queries to the invalidity oracle, improper distribution learning can be done using polynomially many queries.


Using Automatic Generation of Relaxation Constraints to Improve the Preimage Attack on 39-step MD4

arXiv.org Artificial Intelligence

In this paper we construct preimage attack on the truncated variant of the MD4 hash function. Specifically, we study the MD4-39 function defined by the first 39 steps of the MD4 algorithm. We suggest a new attack on MD4-39, which develops the ideas proposed by H. Dobbertin in 1998. Namely, the special relaxation constraints are introduced in order to simplify the equations corresponding to the problem of finding a preimage for an arbitrary MD4-39 hash value. The equations supplemented with the relaxation constraints are then reduced to the Boolean Satisfiability Problem (SAT) and solved using the state-of-the-art SAT solvers. We show that the effectiveness of a set of relaxation constraints can be evaluated using the black-box function of a special kind. Thus, we suggest automatic method of relaxation constraints generation by applying the black-box optimization to this function. The proposed method made it possible to find new relaxation constraints that contribute to a SAT-based preimage attack on MD4-39 which significantly outperforms the competition.


Sample-Efficient Learning of Mixtures

AAAI Conferences

We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let F be an arbitrary class of probability distributions, and let F k denote the class of k-mixtures of elements of F. Assuming the existence of a method for learning F with sample complexity m(ฮต), we provide a method for learning F k with sample complexity O((k.log k .m(ฮต))/(ฮต 2 )). Our mixture learning algorithm has the property that, if the F-learner is proper and agnostic, then the F k -learner would be proper and agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of k axis-aligned Gaussians in R d is PAC-learnable in the agnostic setting with O((kd)/(ฮต 4 )) samples, which is tight in k and d up to logarithmic factors. Second, we show that the class of mixtures of k Gaussians in R d is PAC-learnable in the agnostic setting with sample complexity ร•((kd 2 )/(ฮต 4 )), which improves the previous known bounds of ร•((k 3 .d 2 )/(ฮต 4 )) and ร•(k 4 .d 4 /ฮต 2 ) in its dependence on k and d. Finally, we show that the class of mixtures of k log-concave distributions over R d is PAC-learnable using ร•(k.d ((d+5)/2) ฮต (-(d+9)/2 )) samples.


Improved Results for Minimum Constraint Removal

AAAI Conferences

Given a set of obstacles and two designated points in the plane, the Minimum Constraint Removal problem asks for a minimum number of obstacles that can be removed so that a collision-free path exists between the two designated points. It is a well-studied problem in both robotic motion planning and wireless computing that has been shown to be NP-hard in various settings. In this work, we extend the study of Minimum Constraint Removal. We start by presenting refined NP-hardness reductions for the two cases: (1) when all the obstacles are axes-parallel rectangles, and (2) when all the obstacles are line segments such that no three intersect at the same point. These results improve on existing results in the literature. As a byproduct of our NP-hardness reductions, we prove that, unless the Exponential-Time Hypothesis (ETH) fails, Minimum Constraint Removal cannot be solved in subexponential time 2 o ( n ) , where n is the number of obstacles in the instance. This shows that significant improvement on the brute-force 2 O ( n ) -time algorithm is unlikely. We then present a subexponential-time algorithm for instances of Minimum Constraint Removal in which the number of obstacles that overlap at any point is constant; the algorithm runs in time 2 O (โˆš N ) , where N is the number of the vertices in the auxiliary graph associated with the instance of the problem. We show that significant improvement on this algorithm is unlikely by showing that, unless ETH fails, Minimum Constraint Removal with bounded overlap number cannot be solved in time 2 o (โˆš N ) . We describe several exact algorithms and approximation algorithms that leverage heuristics and discuss their performance in an extensive empirical simulation.


On Cryptographic Attacks Using Backdoors for SAT

AAAI Conferences

Propositional satisfiability (SAT) is at the nucleus of state-of-the-art approaches to a variety of computationally hard problems, one of which is cryptanalysis. Moreover, a number of practical applications of SAT can only be tackled efficiently by identifying and exploiting a subset of formula's variables called backdoor set (or simply backdoors). This paper proposes a new class of backdoor sets for SAT used in the context of cryptographic attacks, namely guess-and-determine attacks. The idea is to identify the best set of backdoor variables subject to a statistically estimated hardness of the guess-and-determine attack using a SAT solver. Experimental results on weakened variants of the renowned encryption algorithms exhibit advantage of the proposed approach compared to the state of the art in terms of the estimated hardness of the resulting guess-and-determine attacks.


Premise Set Caching for Enumerating Minimal Correction Subsets

AAAI Conferences

Methods for explaining the sources of inconsistency of overconstrained systems find an ever-increasing number of applications, ranging from diagnosis and configuration to ontology debugging and axiom pinpointing in description logics. Efficient enumeration of minimal correction subsets (MCSes), defined as sets of constraints whose removal from the system restores feasibility, is a central task in such domains. In this work, we propose a novel approach to speeding up MCS enumeration over conjunctive normal form propositional formulas by caching of so-called premise sets (PSes) seen during the enumeration process. Contrasting to earlier work, we move from caching unsatisfiable cores to caching PSes and propose a more effective way of implementing the cache. The proposed techniques noticeably improves on the performance of state-of-the-art MCS enumeration algorithms in practice.


Warmstarting of Model-Based Algorithm Configuration

AAAI Conferences

The performance of many hard combinatorial problem solvers depends strongly on their parameter settings, and since manual parameter tuning is both tedious and suboptimal the AI community has recently developed several algorithm configuration (AC) methods to automatically address this problem. While all existing AC methods start the configuration process of an algorithm A from scratch for each new type of benchmark instances, here we propose to exploit information about A's performance on previous benchmarks in order to warmstart its configuration on new types of benchmarks. We introduce two complementary ways in which we can exploit this information to warmstart AC methods based on a predictive model. Experiments for optimizing a flexible modern SAT solver on twelve different instance sets show that our methods often yield substantial speedups over existing AC methods (up to 165-fold) and can also find substantially better configurations given the same compute budget.


MaxSAT Resolution With the Dual Rail Encoding

AAAI Conferences

Conflict-driven clause learning (CDCL) is at the core of the success of modern SAT solvers. In terms of propositional proof complexity, CDCL has been shown as strong as general resolution. Improvements to SAT solvers can be realized either by improving existing algorithms, or by exploiting proof systems stronger than CDCL. Recent work proposed an approach for solving SAT by reduction to Horn MaxSAT. The proposed reduction coupled with MaxSAT resolution represents a new proof system, DRMaxSAT, which was shown to enable polynomial time refutations of pigeonhole formulas, in contrast with either CDCL or general resolution. This paper investigates the DRMaxSAT proof system, and shows that DRMaxSAT p-simulates general resolution, that AC0-Frege+PHP p-simulates DRMaxSAT, and that DRMaxSAT can not p-simulate AC0-Frege+PHP or the cutting planes proof system.


Preliminary Results on Exploration-Driven Satisfiability Solving

AAAI Conferences

In this abstract, we present our study of exploring the SAT search space via random-sampling, with the goal of improving Conflict Directed Clause Learning (CDCL) SAT solvers. Our proposed CDCL SAT solving algorithm expSAT uses a novel branching heuristic expVSIDS. It combines the standard VSIDS scores with heuristic scores derived from exploration. Experiments with application benchmarks from recent SAT competitions demonstrate the potential of the expSAT approach for improving CDCL SAT solvers.


Automatic Segmentation of Data Sequences

AAAI Conferences

Segmenting temporal data sequences is an important problem which helps in understanding data dynamics in multiple applications such as epidemic surveillance, motion capture sequences, etc. In this paper, we give DASSA, the first self-guided and efficient algorithm to automatically find a segmentation that best detects the change of pattern in data sequences. To avoid introducing tuning parameters, we design DASSA to be a multi-level method which examines segments at each level of granularity via a compact data structure called the segment-graph. We build this data structure by carefully leveraging the information bottleneck method with the MDL principle to effectively represent each segment.Next, DASSA efficiently finds the optimal segmentation via a novel average-longest-path optimization on the segment-graph. Finally we show how the outputs from DASSA can be naturally interpreted to reveal meaningful patterns. We ran DASSA on multiple real datasets of varying sizes and it is very effective in finding the time-cut points of the segmentations (in some cases recovering the cut points perfectly) as well as in finding the corresponding changing patterns.