constraint propagation
Reinforcement Learning for Search Tree Size Minimization in Constraint Programming: New Results on Scheduling Benchmarks
Heinz, Vilém, Vilím, Petr, Hanzálek, Zdeněk
Failure-Directed Search (FDS) is a significant complete generic search algorithm used in Constraint Programming (CP) to efficiently explore the search space, proven particularly effective on scheduling problems. This paper analyzes FDS's properties, showing that minimizing the size of its search tree guided by ranked branching decisions is closely related to the Multi-armed bandit (MAB) problem. Building on this insight, MAB reinforcement learning algorithms are applied to FDS, extended with problem-specific refinements and parameter tuning, and evaluated on the two most fundamental scheduling problems, the Job Shop Scheduling Problem (JSSP) and Resource-Constrained Project Scheduling Problem (RCPSP). The resulting enhanced FDS, using the best extended MAB algorithm and configuration, performs 1.7 times faster on the JSSP and 2.1 times faster on the RCPSP benchmarks compared to the original implementation in a new solver called OptalCP, while also being 3.5 times faster on the JSSP and 2.1 times faster on the RCPSP benchmarks than the current state-of-the-art FDS algorithm in IBM CP Optimizer 22.1. Furthermore, using only a 900-second time limit per instance, the enhanced FDS improved the existing state-of-the-art lower bounds of 78 of 84 JSSP and 226 of 393 RCPSP standard open benchmark instances while also completely closing a few of them.
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
Wong, Zhen Hao, Deng, Jingwen, He, Runming, Chen, Zirong, You, Qijie, Dong, Hejun, Liang, Hao, Shen, Chengyu, Cui, Bin, Zhang, Wentao
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles, each designed to cultivate distinct reasoning skills such as constraint propagation, spatial consistency, and symbolic deduction. Using a reinforcement learning setup with verifiable rewards, models receive binary feedback based on puzzle correctness, encouraging iterative, hypothesis-driven problem solving. We demonstrate that this training approach significantly improves out-of-distribution performance on a range of mathematical benchmarks, especially for mid-difficulty problems that require multi-step reasoning. Analyses across problem categories and difficulty levels reveal that puzzle training promotes transferable reasoning routines, strengthening algebraic manipulation, geometric inference, and combinatorial logic, while offering limited gains on rote or highly specialized tasks. These findings show that reinforcement learning over logic puzzles reshapes the internal reasoning of LLMs, enabling more robust and compositional generalization without relying on task-specific symbolic tools.
Balafrej
Adaptive constraint propagation has recently received a great attention. It allows a constraint solver to exploit various levels of propagation during search, and in many cases it shows better performance than static/predefined. The crucial point is to make adaptive constraint propagation automatic, so that no expert knowledge or parameter specification is required. In this work, we propose a simple learning technique, based on multi-armed bandits, that allows to automatically select among several levels of propagation during search. Our technique enables the combination of any number of levels of propagation whereas existing techniques are only defined for pairs. An experimental evaluation demonstrates that the proposed technique results in a more efficient and stable solver.
Super-Reparametrizations of Weighted CSPs: Properties and Optimization Perspective
Dlask, Tomáš, Werner, Tomáš, de Givry, Simon
The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but never studied in detail. To fill this gap, we present a number of theoretical properties of super-reparametrizations and compare them to those of reparametrizations. Furthermore, we propose a framework for computing upper bounds on the optimal value of the (maximization version of) WCSP using super-reparametrizations. We show that it is in principle possible to employ arbitrary (under some technical conditions) constraint propagation rules to improve the bound. For arc consistency in particular, the method reduces to the known Virtual AC (VAC) algorithm. Newly, we implemented the method for singleton arc consistency (SAC) and compared it to other strong local consistencies in WCSPs on a public benchmark. The results show that the bounds obtained from SAC are superior for many instance groups.
Expand Your Knowledge of Artificial Intelligence
If you're new to Python programming, consider starting with our AI Programming with Python Nanodegree program. If you're new to computer science algorithms, we recommend our Data Structures & Algorithms Nanodegree program. Learn to write programs using the foundational AI algorithms powering everything from NASA's Mars Rover to DeepMind's AlphaGo Zero. This program requires experience with linear algebra, statistics, and Python (including object-oriented programming). Use constraint propagation and search to build an agent that reasons like a human would to efficiently solve any Sudoku puzzle.
Conflict-driven Inductive Logic Programming
The goal of Inductive Logic Programming (ILP) is to learn a program that explains a set of examples. Until recently, most research on ILP targeted learning Prolog programs. The ILASP system instead learns Answer Set Programs (ASP). Learning such expressive programs widens the applicability of ILP considerably; for example, enabling preference learning, learning common-sense knowledge, including defaults and exceptions, and learning non-deterministic theories. Early versions of ILASP can be considered meta-level ILP approaches, which encode a learning task as a logic program and delegate the search to an ASP solver. More recently, ILASP has shifted towards a new method, inspired by conflict-driven SAT and ASP solvers. The fundamental idea of the approach, called Conflict-driven ILP (CDILP), is to iteratively interleave the search for a hypothesis with the generation of constraints which explain why the current hypothesis does not cover a particular example. These coverage constraints allow ILASP to rule out not just the current hypothesis, but an entire class of hypotheses that do not satisfy the coverage constraint. This paper formalises the CDILP approach and presents the ILASP3 and ILASP4 systems for CDILP, which are demonstrated to be more scalable than previous ILASP systems, particularly in the presence of noise.
CPR for CSPs: A Probabilistic Relaxation of Constraint Propagation
This paper proposes constraint propagation relaxation (CPR), a probabilistic approach to classical constraint propagation that provides another view on the whole parametric family of survey propagation algorithms SP(ρ), ranging from belief propagation (ρ 0) to (pure) survey propagation(ρ 1). Papers published at the Neural Information Processing Systems Conference.
Learning Variable Ordering Heuristics for Solving Constraint Satisfaction Problems
Song, Wen, Cao, Zhiguang, Zhang, Jie, Lim, Andrew
Abstract--Backtracking search algorithms are often used to solve the Constraint Satisfaction Problem (CSP). The efficiency of backtracking search depends greatly on the variable ordering heuristics. Currently, the most commonly used heuristics are handcrafted based on expert knowledge. In this paper, we propose a deep reinforcement learning based approach to automatically discover new variable ordering heuristics that are better adapted for a given class of CSP instances. We show that directly optimizing the search cost is hard for bootstrapping, and propose to optimize the expected cost of reaching a leaf node in the search tree. T o capture the complex relations among the variables and constraints, we design a representation scheme based on Graph Neural Network that can process CSP instances with different sizes and constraint arities. Experimental results on random CSP instances show that the learned policies outperform classical handcrafted heuristics in terms of minimizing the search tree size, and can effectively generalize to instances that are larger than those used in training. Constraint Satisfaction Problem (CSP) is one of the most widely studied problems in computer science and artificial intelligence. It provides a common framework for modeling and solving combinatorial problems in many application domains, such as planning and scheduling [1], [2], vehicle routing [3], [4], graph problems [5], [6], and computational biology [7], [8]. A CSP instance involves a set of variables and constraints. T o solve it, one needs to find a value assignment for all variables such that all constraints are satisfied, or prove such assignment does not exist. Despite its ubiquitous applications, unfortunately, CSP is well known to be NPcomplete in general [9]. T o solve CSP efficiently, backtracking search algorithms are often employed, which are exact algorithms with the guarantee that a solution will be found if one exists.
Experimenting with Constraint Programming on GPU
The focus of my PhD thesis is on exploring parallel approaches to efficiently solve problems modeled by constraints and presenting a new proposal. Current solvers are very advanced; they are carefully designed to effectively manage the high-level problems' description and include refined strategies to avoid useless work. Despite this, finding a solution can take an unacceptable amount of time. Parallelization can mitigate this problem when the instance of the problem modeled is large, as it happens in real world problems. It is done by propagating constraints in parallel and concurrently exploring different parts of the search space. I am developing on a constraint solver that exploits the many cores available on Graphics Processing Units (GPU) to speed up the search.