Constraint-Based Reasoning
Beyond $\mathcal{O}(\sqrt{T})$ Regret for Constrained Online Optimization: Gradual Variations and Mirror Prox
We study constrained online convex optimization, where the constraints consist of a relatively simple constraint set (e.g. a Euclidean ball) and multiple functional constraints. Projections onto such decision sets are usually computationally challenging. So instead of enforcing all constraints over each slot, we allow decisions to violate these functional constraints but aim at achieving a low regret and a low cumulative constraint violation over a horizon of $T$ time slot. The best known bound for solving this problem is $\mathcal{O}(\sqrt{T})$ regret and $\mathcal{O}(1)$ constraint violation, whose algorithms and analysis are restricted to Euclidean spaces. In this paper, we propose a new online primal-dual mirror prox algorithm whose regret is measured via a total gradient variation $V_*(T)$ over a sequence of $T$ loss functions. Specifically, we show that the proposed algorithm can achieve an $\mathcal{O}(\sqrt{V_*(T)})$ regret and $\mathcal{O}(1)$ constraint violation simultaneously. Such a bound holds in general non-Euclidean spaces, is never worse than the previously known $\big( \mathcal{O}(\sqrt{T}), \mathcal{O}(1) \big)$ result, and can be much better on regret when the variation is small. Furthermore, our algorithm is computationally efficient in that only two mirror descent steps are required during each slot instead of solving a general Lagrangian minimization problem. Along the way, our bounds also improve upon those of previous attempts using mirror-prox-type algorithms solving this problem, which yield a relatively worse $\mathcal{O}(T^{2/3})$ regret and $\mathcal{O}(T^{2/3})$ constraint violation.
Constrained Combinatorial Optimization with Reinforcement Learning
Solozabal, Ruben, Ceberio, Josu, Takáč, Martin
This paper presents a framework to tackle constrained combinatorial optimization problems using deep Reinforcement Learning (RL). To this end, we extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation. Notably, we propose defining constrained combinatorial problems as fully observable Constrained Markov Decision Processes (CMDP). In that context, the solution is iteratively constructed based on interactions with the environment. The model, in addition to the reward signal, relies on penalty signals generated from constraint dissatisfaction to infer a policy that acts as a heuristic algorithm. Moreover, having access to the complete state representation during the optimization process allows us to rely on memory-less architectures, enhancing the results obtained in previous sequence-to-sequence approaches. Conducted experiments on the constrained Job Shop and Resource Allocation problems prove the superiority of the proposal for computing rapid solutions when compared to classical heuristic, metaheuristic, and Constraint Programming (CP) solvers.
Learning Objective Boundaries for Constraint Optimization Problems
Spieker, Helge, Gotlieb, Arnaud
Constraint Optimization Problems (COP) are often considered without sufficient knowledge on the boundaries of the objective variable to optimize. When available, tight boundaries are helpful to prune the search space or estimate problem characteristics. Finding close boundaries, that correctly under- and overestimate the optimum, is almost impossible without actually solving the COP. This paper introduces Bion, a novel approach for boundary estimation by learning from previously solved instances of the COP. Based on supervised machine learning, Bion is problem-specific and solver-independent and can be applied to any COP which is repeatedly solved with different data inputs. An experimental evaluation over seven realistic COPs shows that an estimation model can be trained to prune the objective variables' domains by over 80%. By evaluating the estimated boundaries with various COP solvers, we find that Bion improves the solving process for some problems, although the effect of closer bounds is generally problem-dependent.
Constraint-Based Causal Discovery using Partial Ancestral Graphs in the presence of Cycles
Mooij, Joris M., Claassen, Tom
While feedback loops are known to play important roles in many complex systems, their existence is ignored in a large part of the causal discovery literature, as systems are typically assumed to be acyclic from the outset. When applying causal discovery algorithms designed for the acyclic setting on data generated by a system that involves feedback, one would not expect to obtain correct results. In this work, we show that---surprisingly---the output of the Fast Causal Inference (FCI) algorithm is correct if it is applied to observational data generated by a system that involves feedback. More specifically, we prove that for observational data generated by a simple and $\sigma$-faithful Structural Causal Model (SCM), FCI is sound and complete, and can be used to consistently estimate (i) the presence and absence of causal relations, (ii) the presence and absence of direct causal relations, (iii) the absence of confounders, and (iv) the absence of specific cycles in the causal graph of the SCM. We extend these results to constraint-based causal discovery algorithms that exploit certain forms of background knowledge, including the causally sufficient setting (e.g., the PC algorithm) and the Joint Causal Inference setting (e.g., the FCI-JCI algorithm).
An Integer Linear Programming Framework for Mining Constraints from Data
Various structured output prediction problems (e.g., sequential tagging) involve constraints over the output space. By identifying these constraints, we can filter out infeasible solutions and build an accountable model. To this end, we present a general integer linear programming (ILP) framework for mining constraints from data. We model the inference of structured output prediction as an ILP problem. Then, given the coefficients of the objective function and the corresponding solution, we mine the underlying constraints by estimating the outer and inner polytopes of the feasible set. We verify the proposed constraint mining algorithm in various synthetic and real-world applications and demonstrate that the proposed approach successfully identifies the feasible set at scale. In particular, we show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules. We also demonstrate results on hierarchical multi-label classification and conduct a theoretical analysis on how close the mined constraints are from the ground truth.
microPhantom: Playing microRTS under uncertainty and chaos
This competition paper presents microPhantom, a bot playing microRTS and participating in the 2020 microRTS AI competition. microPhantom is based on our previous bot POAdaptive which won the partially observable track of the 2018 and 2019 microRTS AI competitions. In this paper, we focus on decision-making under uncertainty, by tackling the Unit Production Problem with a method based on a combination of Constraint Programming and decision theory. We show that using our method to decide which units to train improves significantly the win rate against the second-best microRTS bot from the partially observable track. We also show that our method is resilient in chaotic environments, with a very small loss of efficiency only. To allow replicability and to facilitate further research, the source code of microPhantom is available, as well as the Constraint Programming toolkit it uses.
Symbolic Logic meets Machine Learning: A Brief Survey in Infinite Domains
The tension between deduction and induction is perhaps the most fundamental issue in areas such as philosophy, cognition and artificial intelligence (AI). The deduction camp concerns itself with questions about the expressiveness of formal languages for capturing knowledge about the world, together with proof systems for reasoning from such knowledge bases. The learning camp attempts to generalize from examples about partial descriptions about the world. In AI, historically, these camps have loosely divided the development of the field, but advances in cross-over areas such as statistical relational learning, neuro-symbolic systems, and high-level control have illustrated that the dichotomy is not very constructive, and perhaps even ill-formed. In this article, we survey work that provides further evidence for the connections between logic and learning. Our narrative is structured in terms of three strands: logic versus learning, machine learning for logic, and logic for machine learning, but naturally, there is considerable overlap. We place an emphasis on the following "sore" point: there is a common misconception that logic is for discrete properties, whereas probability theory and machine learning, more generally, is for continuous properties. We report on results that challenge this view on the limitations of logic, and expose the role that logic can play for learning in infinite domains.
A framework for step-wise explaining how to solve constraint satisfaction problems
Bogaerts, Bart, Gamba, Emilio, Guns, Tias
We explore the problem of step-wise explaining how to solve constraint satisfaction problems, with a use case on logic grid puzzles. More specifically, we study the problem of explaining the inference steps that one can take during propagation, in a way that is easy to interpret for a person. Thereby, we aim to give the constraint solver explainable agency, which can help in building trust in the solver by being able to understand and even learn from the explanations. The main challenge is that of finding a sequence of simple explanations, where each explanation should aim to be as cognitively easy as possible for a human to verify and understand. This contrasts with the arbitrary combination of facts and constraints that the solver may use when propagating. We propose the use of a cost function to quantify how simple an individual explanation of an inference step is, and identify the explanation-production problem of finding the best sequence of explanations of a CSP. Our approach is agnostic of the underlying constraint propagation mechanisms, and can provide explanations even for inference steps resulting from combinations of constraints. In case multiple constraints are involved, we also develop a mechanism that allows to break the most difficult steps up and thus gives the user the ability to zoom in on specific parts of the explanation. Our proposed algorithm iteratively constructs the explanation sequence by using an optimistic estimate of the cost function to guide the search for the best explanation at each step. Our experiments on logic grid puzzles show the feasibility of the approach in terms of the quality of the individual explanations and the resulting explanation sequences obtained.
At-Most-One Constraints in Efficient Representations of Mutex Networks
The At-Most-One (AMO) constraint is a special case of cardinality constraint that requires at most one variable from a set of Boolean variables to be set to TRUE. AMO is important for modeling problems as Boolean satisfiability (SAT) from domains where decision variables represent spatial or temporal placements of some objects that cannot share the same spatial or temporal slot. The AMO constraint can be used for more efficient representation and problem solving in mutex networks consisting of pair-wise mutual exclusions forbidding pairs of Boolean variable to be simultaneously TRUE. An on-line method for automated detection of cliques for efficient representation of incremental mutex networks where new mutexes arrive using AMOs is presented. A comparison of SAT-based problem solving in mutex networks represented by AMO constraints using various encodings is shown.
From Demonstrations to Task-Space Specifications: Using Causal Analysis to Extract Rule Parameterization from Demonstrations
Angelov, Daniel, Hristov, Yordan, Ramamoorthy, Subramanian
Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human-robot interaction. In this work, we show that it is possible to learn generative models for distinct user behavioural types, extracted from human demonstrations, by enforcing clustering of preferred task solutions within the latent space. We use these models to differentiate between user types and to find cases with overlapping solutions. Moreover, we can alter an initially guessed solution to satisfy the preferences that constitute a particular user type by backpropagating through the learned differentiable models. An advantage of structuring generative models in this way is that we can extract causal relationships between symbols that might form part of the user's specification of the task, as manifested in the demonstrations. We further parameterize these specifications through constraint optimization in order to find a safety envelope under which motion planning can be performed. We show that the proposed method is capable of correctly distinguishing between three user types, who differ in degrees of cautiousness in their motion, while performing the task of moving objects with a kinesthetically driven robot in a tabletop environment. Our method successfully identifies the correct type, within the specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL baseline. We also show that our proposed method correctly changes a default trajectory to one satisfying a particular user specification even with unseen objects. The resulting trajectory is shown to be directly implementable on a PR2 humanoid robot completing the same task.