"The Crossword puzzle (CP) is a simple problem to illustrate the formalization process of a problem into a CSP. The problem is to place words of a dictionary in a given structure satisfying certain constraints. The variables are the rows and columns in the crossword, and their values are the words in a dictionary."
– Marc Torrens. An Application using the JCL: The Air Travel Planning System. Diploma Thesis, 1997, Chapter 1, Section 1.2.1.
Constraint Satisfaction Problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F0) of a data stream. Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works. Constraint Satisfaction Problems (CSP's) and a data stream model are two core themes in computer science with a diverse set of applications in topics including probabilistic reasoning, networks, databases, and verification. Model counting and computation of zeroth frequency moment (F0) are fundamental problems for CSPs and a data stream model respectively. This paper is motivated by our observation that despite the usage of similar algorithmic techniques for the two problems, the developments in the two communities have, surprisingly, evolved separately, and rarely has a paper from one community been cited by the other.
Symmetry and dominance breaking can be crucial for solving hard combinatorial search and optimisation problems, but the correctness of these techniques sometimes relies on subtle arguments. For this reason, it is desirable to produce efficient, machine-verifiable certificates that solutions have been computed correctly. Building on the cutting planes proof system, we develop a certification method for optimisation problems in which symmetry and dominance breaking is easily expressible. Our experimental evaluation demonstrates that we can efficiently verify fully general symmetry breaking in Boolean satisfiability (SAT) solving, thus providing, for the first time, a unified method to certify a range of advanced SAT techniques that also includes cardinality and parity (XOR) reasoning. In addition, we apply our method to maximum clique solving and constraint programming as a proof of concept that the approach applies to a wider range of combinatorial problems.
We introduce a formal framework (called NCDC-ASP) for representing and reasoning about cardinal directions between extended spatial objects on a plane, using Answer Set Programming (ASP). NCDC-ASP preserves the meaning of cardinal directional relations as in Cardinal Directional Calculus (CDC), and provides solutions to all consistency checking problems in CDC under various conditions (i.e., for a complete/incomplete set of basic/disjunctive CDC constraints over connected/disconnected spatial objects). In particular, NCDC-ASP models a discretized version of the consistency checking problem in ASP, over a finite grid (rather than a plane), where we provide new lower bounds on the grid size to guarantee that it correctly characterizes solutions for the consistency checking in CDC. In addition, NCDC-ASP has the following two novelties important for applications. NCDC-ASP introduces default CDC constraints to represent and reason about background or commonsense knowledge that involves default qualitative directional relations (e.g., "the ice cream truck is by default to the north of the playground" or "the keyboard is normally placed in front of the monitor"). NCDC-ASP introduces inferred CDC constraints to allow inference of missing CDC relations and to provide them as explanations. We illustrate the uses and usefulness of NCDC-ASP with interesting scenarios from the real-world. We design and develop a variety of benchmark instances, and comprehensively evaluate NCDC-ASP from the perspectives of computational efficiency.
Many important problems in AI, among them #SAT, parameter learning and probabilistic inference go beyond the classical satisfiability problem. Here, instead of finding a solution we are interested in a quantity associated with the set of solutions, such as the number of solutions, the optimal solution or the probability that a query holds in a solution. To model such quantitative problems in a uniform manner, a number of frameworks, e.g. Algebraic Model Counting and Semiring-based Constraint Satisfaction Problems, employ what we call the semiring paradigm. In the latter the abstract algebraic structure of the semiring serves as a means of parameterizing the problem definition, thus allowing for different modes of quantitative computations by choosing different semirings. While efficiently solvable cases have been widely studied, a systematic study of the computational complexity of such problems depending on the semiring parameter is missing. In this work, we characterize the latter by NP(R), a novel generalization of NP over semiring R, and obtain NP(R)-completeness results for a selection of semiring frameworks. To obtain more tangible insights into the hardness of NP(R), we link it to well-known complexity classes from the literature. Interestingly, we manage to connect the computational hardness to properties of the semiring. Using this insight, we see that, on the one hand, NP(R) is always at least as hard as NP or ModpP depending on the semiring R and in general unlikely to be in FPSPACEpoly. On the other hand, for broad subclasses of semirings relevant in practice we can employ reductions to NP, ModpP and #P. These results show that in many cases solutions are only mildly harder to compute than functions in NP, ModpP and #P, give us new insights into how problems that involve counting on semirings can be approached, and provide a means of assessing whether an algorithm is appropriate for a given class of problems.
We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems (CSP), via a common generalization in terms of random bipartite graphs. Our algorithm matches up to a constant factor the best-known bounds for the number of edges (or constraints) needed for perfect recovery and its running time is linear in the number of edges used. The time complexity is significantly better than both spectral and SDP-based approaches. The main contribution of the algorithm is in the case of unequal sizes in the bipartition that arises in our reduction from the planted CSP. Here our algorithm succeeds at a significantly lower density than the spectral approaches, surpassing a barrier based on the spectral norm of a random matrix. Other significant features of the algorithm and analysis include (i) the critical use of power iteration with subsampling, which might be of independent interest; its analysis requires keeping track of multiple norms of an evolving solution (ii) the algorithm can be implemented statistically, i.e., with very limited access to the input distribution (iii) the algorithm is extremely simple to implement and runs in linear time, and thus is practical even for very large instances.
Symmetry breaking is a technique for speeding up propositional satisfiability testing by adding constraints to the theory that restrict the search space while preserving satisfiability. In this work, we extend symmetry breaking to the problem of model finding in weighted and unweighted relational theories, a class of problems that includes MAP inference in Markov Logic and similar statistical-relational languages. We introduce term symmetries, which are induced by an evidence set and extend to symmetries over a relational theory. We provide the important special case of term equivalent symmetries, showing that such symmetries can be found in low-degree polynomial time. We show how to break an exponential number of these symmetries with added constraints whose number is linear in the size of the domain. We demonstrate the effectiveness of these techniques through experiments in two relational domains. We also discuss the connections between relational symmetry breaking and work on lifted inference in statistical-relational reasoning.
Lifted inference rules exploit symmetries for fast reasoning in statistical relational models. Computational complexity of these rules is highly dependent on the choice of the constraint language they operate on and therefore coming up with the right kind of representation is critical to the success of lifted inference. In this paper, we propose a new constraint language, called setineq, which allows subset, equality and inequality constraints, to represent substitutions over the variables in the theory. Our constraint formulation is strictly more expressive than existing representations, yet easy to operate on. We reformulate the three main lifting rules: decomposer, generalized binomial and the recently proposed single occurrence for MAP inference, to work with our constraint representation. Experiments on benchmark MLNs for exact and sampling based inference demonstrate the effectiveness of our approach over several other existing techniques.
General comments: (i) The authors only solve a new special kind of higher order consistency constraints, generalizing soft PN-potentials, but not a truly general class of constraints, as indicated in the title or in the abstract. In case of MAP-inference, which is normally desired, the goal is to obtain a single assignment which satisfies all given linear constraints. The relaxed model the authors optimize is simply a byproduct of looking for marginals instead of MAP-assignments (the added entropy is responsible for this). In case of vanishing entropy one gets the same model. Hence there certainly remains the disadvantage of a parameter in the PN-potential, but now hidden in the entropy.
Inference in Markov random fields subject to consistency structure is a fundamental problem that arises in many real-life applications. In order to enforce consistency, classical approaches utilize consistency potentials or encode constraints over feasible instances. Unfortunately this comes at the price of a tremendous computational burden. In this paper we suggest to tackle consistency by incorporating constraints on beliefs. This permits derivation of a closed-form message-passing algorithm which we refer to as the Constraints Based Convex Belief Propagation (CBCBP). Experiments show that CBCBP outperforms the conventional consistency potential based approach, while being at least an order of magnitude faster.
Towards learning programs from data, we introduce the problem of sampling programs from posterior distributions conditioned on that data. Within this setting, we propose an algorithm that uses a symbolic solver to efficiently sample programs. The proposal combines constraint-based program synthesis with sampling via random parity constraints. We give theoretical guarantees on how well the samples approximate the true posterior, and have empirical results showing the algorithm is efficient in practice, evaluating our approach on 22 program learning problems in the domains of text editing and computer-aided programming.