Rousseau, Louis-Martin
Learning Lagrangian Multipliers for the Travelling Salesman Problem
Parjadis, Augustin, Cappart, Quentin, Dilkina, Bistra, Ferber, Aaron, Rousseau, Louis-Martin
Lagrangian relaxation is a versatile mathematical technique employed to relax constraints in an optimization problem, enabling the generation of dual bounds to prove the optimality of feasible solutions and the design of efficient propagators in constraint programming (such as the weighted circuit constraint). However, the conventional process of deriving Lagrangian multipliers (e.g., using subgradient methods) is often computationally intensive, limiting its practicality for large-scale or time-sensitive problems. To address this challenge, we propose an innovative unsupervised learning approach that harnesses the capabilities of graph neural networks to exploit the problem structure, aiming to generate accurate Lagrangian multipliers efficiently. We apply this technique to the well-known Held-Karp Lagrangian relaxation for the travelling salesman problem. The core idea is to predict accurate Lagrangian multipliers and to employ them as a warm start for generating Held-Karp relaxation bounds. These bounds are subsequently utilized to enhance the filtering process carried out by branch-and-bound algorithms. In contrast to much of the existing literature, which primarily focuses on finding feasible solutions, our approach operates on the dual side, demonstrating that learning can also accelerate the proof of optimality. We conduct experiments across various distributions of the metric travelling salesman problem, considering instances with up to 200 cities. The results illustrate that our approach can improve the filtering level of the weighted circuit global constraint, reduce the optimality gap by a factor two for unsolved instances up to a timeout, and reduce the execution time for solved instances by 10%.
Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver
Marty, Tom, Franรงois, Tristan, Tessier, Pierre, Gauthier, Louis, Rousseau, Louis-Martin, Cappart, Quentin
Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.
Improved Peel-and-Bound: Methods for Generating Dual Bounds with Multivalued Decision Diagrams
Rudich, Isaac (a:1:{s:5:"en_US";s:23:"Polytechnique Montrรฉal";}) | Cappart, Quentin (Polytechnique Montrรฉal) | Rousseau, Louis-Martin (Polytechnique Montrรฉal)
Decision diagrams are an increasingly important tool in cutting-edge solvers for discrete optimization. However, the field of decision diagrams is relatively new, and is still incorporating the library of techniques that conventional solvers have had decades to build. We drew inspiration from the warm-start technique used in conventional solvers to address one of the major challenges faced by decision diagram based methods. Decision diagrams become more useful the wider they are allowed to be, but also become more costly to generate, especially with large numbers of variables. In the original version of this paper, we presented a method of peeling off a sub-graph of previously constructed diagrams and using it as the initial diagram for subsequent iterations that we call peel-and-bound. We tested the method on the sequence ordering problem, and our results indicate that our peel-and-bound scheme generates stronger bounds than a branch-and-bound scheme using the same propagators, and at significantly less computational cost. In this extended version of the paper, we also propose new methods for using relaxed decision diagrams to improve the solutions found using restricted decision diagrams, discuss the heuristic decisions involved with the parallelization of peel-and-bound, and discuss how peel-and-bound can be hyper-optimized for sequencing problems. Furthermore, we test the new methods on the sequence ordering problem and the traveling salesman problem with time-windows (TSPTW), and include an updated and generalized implementation of the algorithm capable of handling any discrete optimization problem. The new results show that peel-and-bound outperforms ddo (a decision diagram based branch-and-bound solver) on the TSPTW. We also close 15 open benchmark instances of the TSPTW.
SeaPearl: A Constraint Programming Solver guided by Reinforcement Learning
Chalumeau, Fรฉlix, Coulon, Ilan, Cappart, Quentin, Rousseau, Louis-Martin
The design of efficient and generic algorithms for solving combinatorial optimization problems has been an active field of research for many years. Standard exact solving approaches are based on a clever and complete enumeration of the solution set. A critical and non-trivial design choice with such methods is the branching strategy, directing how the search is performed. The last decade has shown an increasing interest in the design of machine learning-based heuristics to solve combinatorial optimization problems. The goal is to leverage knowledge from historical data to solve similar new instances of a problem. Used alone, such heuristics are only able to provide approximate solutions efficiently, but cannot prove optimality nor bounds on their solution. Recent works have shown that reinforcement learning can be successfully used for driving the search phase of constraint programming (CP) solvers. However, it has also been shown that this hybridization is challenging to build, as standard CP frameworks do not natively include machine learning mechanisms, leading to some sources of inefficiencies. This paper presents the proof of concept for SeaPearl, a new CP solver implemented in Julia, that supports machine learning routines in order to learn branching decisions using reinforcement learning. Support for modeling the learning component is also provided. We illustrate the modeling and solution performance of this new solver on two problems. Although not yet competitive with industrial solvers, SeaPearl aims to provide a flexible and open-source framework in order to facilitate future research in the hybridization of constraint programming and machine learning.
Learning TSP Requires Rethinking Generalization
Joshi, Chaitanya K., Cappart, Quentin, Rousseau, Louis-Martin, Laurent, Thomas, Bresson, Xavier
End-to-end training of neural network solvers for combinatorial problems such as the Travelling Salesman Problem is intractable and inefficient beyond a few hundreds of nodes. While state-of-the-art Machine Learning approaches perform closely to classical solvers for trivially small sizes, they are unable to generalize the learnt policy to larger instances of practical scales. Towards leveraging transfer learning to solve large-scale TSPs, this paper identifies inductive biases, model architectures and learning algorithms that promote generalization to instances larger than those seen in training. Our controlled experiments provide the first principled investigation into such zero-shot generalization, revealing that extrapolating beyond training data requires rethinking the entire neural combinatorial optimization pipeline, from network layers and learning paradigms to evaluation protocols.
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization
Cappart, Quentin, Moisan, Thierry, Rousseau, Louis-Martin, Prรฉmont-Schwarz, Isabeau, Cire, Andre
Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.
How to Evaluate Machine Learning Approaches for Combinatorial Optimization: Application to the Travelling Salesman Problem
Franรงois, Antoine, Cappart, Quentin, Rousseau, Louis-Martin
Combinatorial optimization is the field devoted to the study and practice of algorithms that solve NP-hard problems. As Machine Learning (ML) and deep learning have popularized, several research groups have started to use ML to solve combinatorial optimization problems, such as the well-known Travelling Salesman Problem (TSP). Based on deep (reinforcement) learning, new models and architecture for the TSP have been successively developed and have gained increasing performances. At the time of writing, state-of-the-art models provide solutions to TSP instances of 100 cities that are roughly 1.33% away from optimal solutions. However, despite these apparently positive results, the performances remain far from those that can be achieved using a specialized search procedure. In this paper, we address the limitations of ML approaches for solving the TSP and investigate two fundamental questions: (1) how can we measure the level of accuracy of the pure ML component of such methods; and (2) what is the impact of a search procedure plugged inside a ML model on the performances? To answer these questions, we propose a new metric, ratio of optimal decisions (ROD), based on a fair comparison with a parametrized oracle, mimicking a ML model with a controlled accuracy. All the experiments are carried out on four state-of-the-art ML approaches dedicated to solve the TSP. Finally, we made ROD open-source in order to ease future research in the field.
Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning
Cappart, Quentin, Goutierre, Emmanuel, Bergman, David, Rousseau, Louis-Martin
Finding tight bounds on the optimal solution is a critical element of practical solution methods for discrete optimization problems. In the last decade, decision diagrams (DDs) have brought a new perspective on obtaining upper and lower bounds that can be significantly better than classical bounding mechanisms, such as linear relaxations. It is well known that the quality of the bound achieved through this flexible bounding method is highly reliant on the ordering of variables chosen for building the diagram, and finding an ordering that optimizes standard metrics, or even improving one, is an NP-hard problem. In this paper, we propose an innovative and generic approach based on deep reinforcement learning for obtaining an ordering for tightening the bounds obtained with relaxed and restricted DDs. We apply the approach to both the Maximum Independent Set Problem and the Maximum Cut Problem. Experimental results on synthetic instances show that the deep reinforcement learning approach, by achieving tighter objective function bounds, generally outperforms ordering methods commonly used in the literature when the distribution of instances is known. To the best knowledge of the authors, this is the first paper to apply machine learning to directly improve relaxation bounds obtained by general-purpose bounding mechanisms for combinatorial optimization problems.