Goto

Collaborating Authors

 s2v-dqn



A Benchmark for Maximum Cut: Towards Standardization of the Evaluation of Learned Heuristics for Combinatorial Optimization

Nath, Ankur, Kuhnle, Alan

arXiv.org Artificial Intelligence

Recently, there has been much work on the design of general heuristics for graph-based, combinatorial optimization problems via the incorporation of Graph Neural Networks (GNNs) to learn distribution-specific solution structures.However, there is a lack of consistency in the evaluation of these heuristics, in terms of the baselines and instances chosen, which makes it difficult to assess the relative performance of the algorithms. In this paper, we propose an open-source benchmark suite MaxCut-Bench dedicated to the NP-hard Maximum Cut problem in both its weighted and unweighted variants, based on a careful selection of instances curated from diverse graph datasets. The suite offers a unified interface to various heuristics, both traditional and machine learning-based. Next, we use the benchmark in an attempt to systematically corroborate or reproduce the results of several, popular learning-based approaches, including S2V-DQN [31], ECO-DQN [4], among others, in terms of three dimensions: objective value, generalization, and scalability. Our empirical results show that several of the learned heuristics fail to outperform a naive greedy algorithm, and that only one of them consistently outperforms Tabu Search, a simple, general heuristic based upon local search. Furthermore, we find that the performance of ECO-DQN remains the same or is improved if the GNN is replaced by a simple linear regression on a subset of the features that are related to Tabu Search. Code, data, and pretrained models are available at: \url{https://github.com/ankurnath/MaxCut-Bench}.


Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Pu, Tianle, Fan, Changjun, Shen, Mutian, Lu, Yizhou, Zeng, Li, Nussinov, Zohar, Chen, Chao, Liu, Zhong

arXiv.org Artificial Intelligence

Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs.


Reversible Action Design for Combinatorial Optimization with Reinforcement Learning

Yao, Fan, Cai, Renqin, Wang, Hongning

arXiv.org Artificial Intelligence

Combinatorial optimization problem (COP) over graphs is a fundamental challenge in optimization. Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems and has demonstrated promising results. However, most RL solutions employ a greedy manner to construct the solution incrementally, thus inevitably pose unnecessary dependency on action sequences and need a lot of problem-specific designs. We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs. Specifically, we define state as a solution to a problem instance and action as a perturbation to this solution. We utilize graph neural networks (GNN) to extract latent representations for given problem instances for state-action encoding, and then apply deep Q-learning to obtain a policy that gradually refines the solution by flipping or swapping vertex labels. Experiments are conducted on Maximum $k$-Cut and Traveling Salesman Problem and performance improvement is achieved against a set of learning-based and heuristic baselines.


Learning What to Defer for Maximum Independent Sets

Ahn, Sungsoo, Seo, Younggyo, Shin, Jinwoo

arXiv.org Machine Learning

Designing efficient algorithms for combinatorial optimization appears ubiquitously in various scientific fields. Recently, deep reinforcement learning (DRL) frameworks have gained considerable attention as a new approach: they can automate the design of a solver while relying less on sophisticated domain knowledge of the target problem. However, the existing DRL solvers determine the solution using a number of stages proportional to the number of elements in the solution, which severely limits their applicability to large-scale graphs. In this paper, we seek to resolve this issue by proposing a novel DRL scheme, coined learning what to defer (LwD), where the agent adaptively shrinks or stretch the number of stages by learning to distribute the element-wise decisions of the solution at each stage. We apply the proposed framework to the maximum independent set (MIS) problem, and demonstrate its significant improvement over the current state-of-the-art DRL scheme. We also show that LwD can outperform the conventional MIS solvers on large-scale graphs having millions of vertices, under a limited time budget.


Learning Combinatorial Optimization Algorithms over Graphs

Dai, Hanjun, Khalil, Elias B., Zhang, Yuyu, Dilkina, Bistra, Song, Le

arXiv.org Machine Learning

The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and learns effective algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman problems.


Learning Combinatorial Optimization Algorithms over Graphs

Khalil, Elias, Dai, Hanjun, Zhang, Yuyu, Dilkina, Bistra, Song, Le

Neural Information Processing Systems

The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and learns effective algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman problems.