Search
Using Tabu Search Algorithm for Map Generation in the Terra Mystica Tabletop Game
Grichshenko, Alexandr, de Araujo, Luiz Jonata Pires, Gimaeva, Susanna, Brown, Joseph Alexander
Tabu Search (TS) metaheuristic improves simple local search algorithms (e.g. steepest ascend hill-climbing) by enabling the algorithm to escape local optima points. It has shown to be useful for addressing several combinatorial optimization problems. This paper investigates the performance of TS and considers the effects of the size of the Tabu list and the size of the neighbourhood for a procedural content generation, specifically the generation of maps for a popular tabletop game called Terra Mystica. The results validate the feasibility of the proposed method and how it can be used to generate maps that improve existing maps for the game.
Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning
Feng, Dieqiao, Gomes, Carla P., Selman, Bart
Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.
Sample Efficient Graph-Based Optimization with Noisy Observations
Nguyen, Tan, Shameli, Ali, Abbasi-Yadkori, Yasin, Rao, Anup, Kveton, Branislav
We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations. We define a notion of convexity, and we show that a variant of best-arm identification can find a near-optimal solution after a small number of queries that is independent of the size of the graph. For functions that have local minima and are nearly convex, we show a sample complexity for the classical simulated annealing under noisy observations. We show effectiveness of the greedy algorithm with restarts and the simulated annealing on problems of graph-based nearest neighbor classification as well as a web document re-ranking application.
Extending the Multiple Traveling Salesman Problem for Scheduling a Fleet of Drones Performing Monitoring Missions
Rigas, Emmanouil, Kolios, Panayiotis, Ellinas, Georgios
In this paper we schedule the travel path of a set of drones across a graph where the nodes need to be visited multiple times at pre-defined points in time. This is an extension of the well-known multiple traveling salesman problem. The proposed formulation can be applied in several domains such as the monitoring of traffic flows in a transportation network, or the monitoring of remote locations to assist search and rescue missions. Aiming to find the optimal schedule, the problem is formulated as an Integer Linear Program (ILP). Given that the problem is highly combinatorial, the optimal solution scales only for small sized problems. Thus, a greedy algorithm is also proposed that uses a one-step look ahead heuristic search mechanism. In a detailed evaluation, it is observed that the greedy algorithm has near-optimal performance as it is on average at 92.06% of the optimal, while it can potentially scale up to settings with hundreds of drones and locations.
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization
Cappart, Quentin, Moisan, Thierry, Rousseau, Louis-Martin, Prémont-Schwarz, Isabeau, Cire, Andre
Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.
Neural Bipartite Matching
Graph neural networks (GNNs) have found application Performing the reasoning is achieved via neural execution, for learning in the space of algorithms. in a similar fashion to Veličković et al. (2020). GNNs have However, the algorithms chosen by existing research been both empirically (Veličković et al., 2020) and theoretically (sorting, Breadth-First search, shortest path (Xu et al., 2020) shown to be applicable to algorithmic finding, etc.) usually align perfectly with a standard tasks on graphs, strongly generalising on inputs of sizes GNN architecture. This report describes much larger than trained on. However, these algorithms how neural execution is applied to a complex algorithm, rely on a locally contained and fixed dataflow which aligns such as finding maximum bipartite matching perfectly with a standard GNN architecture, making them by reducing it to a flow problem and using easy to model with GNNs (c.f.
Revisiting Bounded-Suboptimal Safe Interval Path Planning
Yakovlev, Konstantin, Andreychuk, Anton, Stern, Roni
Safe-interval path planning (SIPP) is a powerful algorithm for finding a path in the presence of dynamic obstacles. SIPP returns provably optimal solutions. However, in many practical applications of SIPP such as path planning for robots, one would like to trade-off optimality for shorter planning time. In this paper we explore different ways to build a bounded-suboptimal SIPP and discuss their pros and cons. We compare the different bounded-suboptimal versions of SIPP experimentally. While there is no universal winner, the results provide insights into when each method should be used.
A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions
Ren, Pengzhen, Xiao, Yun, Chang, Xiaojun, Huang, Po-Yao, Li, Zhihui, Chen, Xiaojiang, Wang, Xin
Deep learning has made major breakthroughs and progress in many fields. This is due to the powerful automatic representation capabilities of deep learning. It has been proved that the design of the network architecture is crucial to the feature representation of data and the final performance. In order to obtain a good feature representation of data, the researchers designed various complex network architectures. However, the design of the network architecture relies heavily on the researchers' prior knowledge and experience. Therefore, a natural idea is to reduce human intervention as much as possible and let the algorithm automatically design the architecture of the network. Thus going further to the strong intelligence. In recent years, a large number of related algorithms for \textit{Neural Architecture Search} (NAS) have emerged. They have made various improvements to the NAS algorithm, and the related research work is complicated and rich. In order to reduce the difficulty for beginners to conduct NAS-related research, a comprehensive and systematic survey on the NAS is essential. Previously related surveys began to classify existing work mainly from the basic components of NAS: search space, search strategy and evaluation strategy. This classification method is more intuitive, but it is difficult for readers to grasp the challenges and the landmark work in the middle. Therefore, in this survey, we provide a new perspective: starting with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then giving solutions for subsequent related research work. In addition, we conducted a detailed and comprehensive analysis, comparison and summary of these works. Finally, we give possible future research directions.
Cascaded Text Generation with Markov Transformers
Deng, Yuntian, Rush, Alexander M.
The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
Soemers, Dennis J. N. J., Piette, Éric, Stephenson, Matthew, Browne, Cameron
Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play. ExIt involves training a policy to mimic the search behaviour of a tree search algorithm - such as Monte-Carlo tree search - and using the trained policy to guide it. The policy and the tree search can then iteratively improve each other, through experience gathered in self-play between instances of the guided tree search algorithm. This paper outlines three different approaches for manipulating the distribution of data collected from self-play, and the procedure that samples batches for learning updates from the collected data. Firstly, samples in batches are weighted based on the durations of the episodes in which they were originally experienced. Secondly, Prioritized Experience Replay is applied within the ExIt framework, to prioritise sampling experience from which we expect to obtain valuable training signals. Thirdly, a trained exploratory policy is used to diversify the trajectories experienced in self-play. This paper summarises the effects of these manipulations on training performance evaluated in fourteen different board games. We find major improvements in early training performance in some games, and minor improvements averaged over fourteen games.