Goto

Collaborating Authors

 Constraint-Based Reasoning


Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

arXiv.org Machine Learning

We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of \ofulq and cast it into a constrained \textit{extended} LQR problem, where an additional control variable implicitly selects the system dynamics within a confidence interval. We then move to the corresponding Lagrangian formulation for which we prove strong duality. As a result, we show that an $\epsilon$-optimistic controller can be computed efficiently by solving at most $O\big(\log(1/\epsilon)\big)$ Riccati equations. Finally, we prove that relaxing the original \ofu problem does not impact the learning performance, thus recovering the $\tilde{O}(\sqrt{T})$ regret of \ofulq. To the best of our knowledge, this is the first computationally efficient confidence-based algorithm for LQR with worst-case optimal regret guarantees.


Strengthening neighbourhood substitution

arXiv.org Artificial Intelligence

Domain reduction is an essential tool for solving the constraint satisfaction problem (CSP). In the binary CSP, neighbourhood substitution consists in eliminating a value if there exists another value which can be substituted for it in each constraint. We show that the notion of neighbourhood substitution can be strengthened in two distinct ways without increasing time complexity. We also show the theoretical result that, unlike neighbourhood substitution, finding an optimal sequence of these new operations is NP-hard.


Half-checking propagators

arXiv.org Artificial Intelligence

Propagators are central to the success of constraint programming, that is contracting functions removing values proven not to be in any solution of a given constraint. The literature contains numerous propagation algorithms, for many different constraints, and common to all these propagation algorithms is the notion of correctness: only values that appear in no solution to the respective constraint may be removed. In this paper half-checking propagators are introduced, for which the only requirements are that identified solutions (by the propagators) are actual solutions (to the corresponding constraints), and that the propagators are contracting. In particular, a half-checking propagator may remove solutions resulting in an incomplete solving process, but with the upside that (good) solutions may be found faster. Overall completeness can be obtained by running half-checking propagators as one component in a portfolio solving process. Half-checking propagators opens up a wider variety of techniques to be used when designing propagation algorithms, compared to what is currently available. A formal model for half-checking propagators is introduced, together with a detailed description of how to support such propagators in a constraint programming system. Three general directions for creating half-checking propagation algorithms are introduced, and used for designing new half-checking propagators for the cost-circuit constraint as examples. The new propagators are implemented in the Gecode system.


Solving the Clustered Traveling Salesman Problem via TSP methods

arXiv.org Artificial Intelligence

The Clustered Traveling Salesman Problem (CTSP) is a variant of the popular Traveling Salesman Problem (TSP) arising from a number of real-life applications. In this work, we explore an uncharted solution approach that solves the CTSP by transforming it to the well-studied TSP. For this purpose, we first investigate a technique to convert a CTSP instance to a TSP and then apply popular TSP solvers (including exact and heuristic solvers) to solve the resulting TSP instance. We want to answer the following questions: How do state-of-the-art TSP solvers perform on clustered instances converted from the CTSP? Do state-of-the-art TSP solvers compete well with the best performing methods specifically designed for the CTSP? For this purpose, we present intensive computational experiments on various CTSP benchmark instances to draw conclusions.


Automation Strategies for Unconstrained Crossword Puzzle Generation

arXiv.org Artificial Intelligence

An unconstrained crossword puzzle is a generalization of the constrained crossword problem. In this problem, only the word vocabulary, and optionally the grid dimensions are known. Hence, it not only requires the algorithm to determine the word locations, but it also needs to come up with the grid geometry. This paper discusses algorithmic strategies for automatic crossword puzzle generation in such an unconstrained setting. The strategies proposed cover the tasks of selection of words from a given vocabulary, selection of grid sizes, grid resizing and adjustments, metrics for word fitting, back-tracking techniques, and also clue generation. The strategies have been formulated based on a study of the effect of word sequence permutation order on grid fitting. An end-to-end algorithm that combines these strategies is presented, and its performance is analyzed. The techniques have been found to be successful in quickly producing well-packed puzzles of even large sizes. Finally, a few example puzzles generated by our algorithm are also provided.


Set-Invariant Constrained Reinforcement Learning with a Meta-Optimizer

arXiv.org Artificial Intelligence

This paper investigates reinforcement learning with constraints, which is indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, the constraints are taken as Lyapunov functions, and new linear constraints are imposed on the updating dynamics of the policy parameters such that the original safety set is forward-invariant in expectation. As the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a neural network-based meta-optimizer to optimize the objective while satisfying such linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. Ultimately, the meta-optimizer trains the policy network to monotonically decrease the constraint violation and maximize the cumulative reward. Numerical results validate the theoretical findings.


Human-Robot Team Coordination with Dynamic and Latent Human Task Proficiencies: Scheduling with Learning Curves

arXiv.org Artificial Intelligence

As robots become ubiquitous in the workforce, it is essential that human-robot collaboration be both intuitive and adaptive. A robot's quality improves based on its ability to explicitly reason about the time-varying (i.e. learning curves) and stochastic capabilities of its human counterparts, and adjust the joint workload to improve efficiency while factoring human preferences. We introduce a novel resource coordination algorithm that enables robots to explore the relative strengths and learning abilities of their human teammates, by constructing schedules that are robust to stochastic and time-varying human task performance. We first validate our algorithmic approach using data we collected from a user study (n = 20), showing we can quickly generate and evaluate a robust schedule while discovering the latest individual worker proficiency. Second, we conduct a between-subjects experiment (n = 90) to validate the efficacy of our coordinating algorithm. Results from the human-subjects experiment indicate that scheduling strategies favoring exploration tend to be beneficial for human-robot collaboration as it improves team fluency (p = 0.0438), while also maximizing team efficiency (p < 0.001).


Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot Transfer

arXiv.org Artificial Intelligence

In Hierarchical Control, compositionality, abstraction, and task-transfer are crucial for designing versatile algorithms which can solve a variety of problems with maximal representational reuse. We propose a novel hierarchical and compositional framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks with ordering constraints, while also providing a fast linearly-solvable algorithm as an implementation. This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions, and utilizes transition operators called feasibility functions, which are used to summarize initial-to-final state dynamics of the polices. Consequently, the added complexity of grounding a high-level task space onto a larger ambient state-space can be mitigated by optimizing in a lower-dimensional subspace defined by the grounding, substantially improving the scalability of the algorithm while effecting transferable solutions. We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.


Accelerated Message Passing for Entropy-Regularized MAP Inference

arXiv.org Machine Learning

Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution. Due to the difficulty of this combinatorial problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms that are often interpreted as coordinate descent on the dual LP. To achieve more desirable computational properties, a number of methods regularize the LP with an entropy term, leading to a class of smooth message passing algorithms with convergence guarantees. In this paper, we present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient methods. The proposed algorithms incorporate the familiar steps of standard smooth message passing algorithms, which can be viewed as coordinate minimization steps. We show that these accelerated variants achieve faster rates for finding $\epsilon$-optimal points of the unregularized problem, and, when the LP is tight, we prove that the proposed algorithms recover the true MAP solution in fewer iterations than standard message passing algorithms.


Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments

arXiv.org Artificial Intelligence

We consider three important challenges in conference peer review: (i) reviewers maliciously attempting to get assigned to certain papers to provide positive reviews, possibly as part of quid-pro-quo arrangements with the authors; (ii) "torpedo reviewing," where reviewers deliberately attempt to get assigned to certain papers that they dislike in order to reject them; (iii) reviewer de-anonymization on release of the similarities and the reviewer-assignment code. On the conceptual front, we identify connections between these three problems and present a framework that brings all these challenges under a common umbrella. We then present a (randomized) algorithm for reviewer assignment that can optimally solve the reviewer-assignment problem under any given constraints on the probability of assignment for any reviewer-paper pair. We further consider the problem of restricting the joint probability that certain suspect pairs of reviewers are assigned to certain papers, and show that this problem is NP-hard for arbitrary constraints on these joint probabilities but efficiently solvable for a practical special case. Finally, we experimentally evaluate our algorithms on datasets from past conferences, where we observe that they can limit the chance that any malicious reviewer gets assigned to their desired paper to 50% while producing assignments with over 90% of the total optimal similarity. Our algorithms still achieve this similarity while also preventing reviewers with close associations from being assigned to the same paper.