Search
Robust and Efficient Planning using Adaptive Entropy Tree Search
Kozakowski, Piotr, Pacek, Mikołaj, Miłoś, Piotr
In this paper, we present the Adaptive EntropyTree Search (ANTS) algorithm. ANTS builds on recent successes of maximum entropy planning while mitigating its arguably major drawback - sensitivity to the temperature setting. We endow ANTS with a mechanism, which adapts the temperature to match a given range of action selection entropy in the nodes of the planning tree. With this mechanism, the ANTS planner enjoys remarkable hyper-parameter robustness, achieves high scores on the Atari benchmark, and is a capable component of a planning-learning loop akin to AlphaZero. We believe that all these features make ANTS a compelling choice for a general planner for complex tasks.
Edge Minimizing the Student Conflict Graph
Academic timetabling is the task of scheduling courses to specific times in such a way that there are no conflicts. Most of the models considered in the literature assume that this conflict information is already known. However in many real life timetabling problems, courses are taught in multiple sections and until a student is assigned to a specific section of a course, the conflict information is not known. M.W. Carter [Car00] sums it up nicely "When courses are offered in multiple sections as they are at Waterloo, it creates a timetabling paradox. Students request a course, but timetabling assigns days and times to course sections. We cannot assign times to sections until we know which students are in each section. But we cannot assign students to sections until we know when the sections are timetabled!"
Classifier Chains: A Review and Perspectives
Read, Jesse, Pfahringer, Bernhard, Holmes, Geoffrey, Frank, Eibe
The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves chaining together off-the-shelf binary classifiers in a directed structure, such that individual label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of the underlying mechanism and efficacy, and investigation into how it could be improved. In the recent decade, numerous studies have explored the theoretical underpinnings of classifier chains, and many improvements have been made to the training and inference procedures, such that this method remains among the best options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining key issues for future research.
Hedging of Financial Derivative Contracts via Monte Carlo Tree Search
The construction of approximate replication strategies for derivative contracts in incomplete markets is a key problem of financial engineering. Recently Reinforcement Learning algorithms for pricing and hedging under realistic market conditions have attracted significant interest. While financial research mostly focused on variations of Q-learning, in Artificial Intelligence Monte Carlo Tree Search is the recognized stateof-the-art method for various planning problems, such as the games of Hex, Chess, Go,... This article introduces Monte Carlo Tree Search as a method to solve the stochastic optimal control problem underlying the pricing and hedging of financial derivatives. As compared to Q-learning it combines reinforcement learning with tree search techniques. As a consequence Monte Carlo Tree Search has higher sample efficiency, is less prone to over-fitting to specific market models and generally learns stronger policies faster. In our experiments we find that Monte Carlo Tree Search, being the world-champion in games like Chess and Go, is easily capable of directly maximizing the utility of investor's terminal wealth without an intermediate mathematical theory.
Proof Artifact Co-training for Theorem Proving with Language Models
Han, Jesse Michael, Rute, Jason, Wu, Yuhuai, Ayers, Edward W., Polu, Stanislas
Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT ({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective. We apply this methodology to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date. We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32\% to 48\%.
Focusing on the Hybrid Quantum Computing -- Tabu Search Algorithm: new results on the Asymmetric Salesman Problem
Osaba, Eneko, Villar-Rodriguez, Esther, Oregi, Izaskun, Moreno-Fernandez-de-Leceta, Aitor
Quantum Computing is an emerging paradigm which is gathering a lot of popularity in the current scientific and technological community. Widely conceived as the next frontier of computation, Quantum Computing is still at the dawn of its development being current solving systems suffering from significant limitations in terms of performance and capabilities. Some interesting approaches have been devised by researchers and practitioners in order to overcome these barriers, being quantum-classical hybrid algorithms one of the most often used solving schemes. The main goal of this paper is to extend the results and findings of the recently proposed hybrid Quantum Computing - Tabu Search Algorithm for partitioning problems. To do that, we focus our research on the adaptation of this method to the Asymmetric Traveling Salesman Problem. In overall, we have employed six well-known instances belonging to TSPLIB to assess the performance of Quantum Computing - Tabu Search Algorithm in comparison to QBSolv -- a state-of-the-art decomposing solver. Furthermore, as an additional contribution, this work also supposes the first solver of the Asymmetric Traveling Salesman Problem using a Quantum Computing based method. Aiming to boost whole community's research in QC, we have released the project's repository as open source code for further application and improvements.
Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems
Li, Kaiwen, Zhang, Tao, Wang, Rui Wang Yuheng, Han, Yi
This paper introduces a new deep learning approach to approximately solve the Covering Salesman Problem (CSP). In this approach, given the city locations of a CSP as input, a deep neural network model is designed to directly output the solution. It is trained using the deep reinforcement learning without supervision. Specifically, in the model, we apply the Multi-head Attention to capture the structural patterns, and design a dynamic embedding to handle the dynamic patterns of the problem. Once the model is trained, it can generalize to various types of CSP tasks (different sizes and topologies) with no need of re-training. Through controlled experiments, the proposed approach shows desirable time complexity: it runs more than 20 times faster than the traditional heuristic solvers with a tiny gap of optimality. Moreover, it significantly outperforms the current state-of-the-art deep learning approaches for combinatorial optimization in the aspect of both training and inference. In comparison with traditional solvers, this approach is highly desirable for most of the challenging tasks in practice that are usually large-scale and require quick decisions.
Regarding Goal Bounding and Jump Point Search
Hu, Yue | Harabor, Daniel (Monash University) | Qin, Long (National University of Defense Technology, China) | Yin, Quanjun (National University of Defense Technology, China)
Jump Point Search (JPS) is a well known symmetry-breaking algorithm that can substantially improve performance for grid-based optimal pathfinding. When the input grid is static further speedups can be obtained by combining JPS with goal bounding techniques such as Geometric Containers (instantiated as Bounding Boxes) and Compressed Path Databases. Two such methods, JPS+BB and Two-Oracle Path PlannING (Topping), are currently among the fastest known approaches for computing shortest paths on grids. The principal drawback for these algorithms is the overhead costs: each one requires an all-pairs precomputation step, the running time and subsequent storage costs of which can be prohibitive. In this work we consider an alternative approach where we precompute and store goal bounding data only for grid cells which are also jump points. Since the number of jump points is usually much smaller than the total number of grid cells, we can save up to orders of magnitude in preprocessing time and space. Considerable precomputation savings do not necessarily mean performance degradation. For a second contribution we show how canonical orderings, partial expansion strategies and enhanced intermediate pruning can be leveraged to improve online query performance despite a reduction in preprocessed data. The combination of faster preprocessing and stronger online reasoning leads to three new and highly performant algorithms: JPS+BB+ and Two-Oracle Pathfinding Search (TOPS) based on search, and Topping+ based on path extraction. We give a theoretical analysis showing that each method is complete and optimal. We also report convincing gains in a comprehensive empirical evaluation that includes almost all current and cutting-edge algorithms for grid-based pathfinding.
Simple and Near-Optimal MAP Inference for Nonsymmetric DPPs
Anari, Nima, Vuong, Thuy-Duong
Determinantal point processes (DPPs) are widely popular probabilistic models used in machine learning to capture diversity in random subsets of items. While traditional DPPs are defined by a symmetric kernel matrix, recent work has shown a significant increase in the modeling power and applicability of models defined by nonsymmetric kernels, where the model can capture interactions that go beyond diversity. We study the problem of maximum a posteriori (MAP) inference for determinantal point processes defined by a nonsymmetric positive semidefinite matrix (NDPPs), where the goal is to find the maximum $k\times k$ principal minor of the kernel matrix $L$. We obtain the first multiplicative approximation guarantee for this problem using local search, a method that has been previously applied to symmetric DPPs. Our approximation factor of $k^{O(k)}$ is nearly tight, and we show theoretically and experimentally that it compares favorably to the state-of-the-art methods for this problem that are based on greedy maximization. The main new insight enabling our improved approximation factor is that we allow local search to update up to two elements of the solution in each iteration, and we show this is necessary to have any multiplicative approximation guarantee.
Reinforcement Learning for Optimized Beam Training in Multi-Hop Terahertz Communications
Communication at terahertz (THz) frequency bands is a promising solution for achieving extremely high data rates in next-generation wireless networks. While the THz communication is conventionally envisioned for short-range wireless applications due to the high atmospheric absorption at THz frequencies, multi-hop directional transmissions can be enabled to extend the communication range. However, to realize multi-hop THz communications, conventional beam training schemes, such as exhaustive search or hierarchical methods with a fixed number of training levels, can lead to a very large time overhead. To address this challenge, in this paper, a novel hierarchical beam training scheme with dynamic training levels is proposed to optimize the performance of multi-hop THz links. In fact, an optimization problem is formulated to maximize the overall spectral efficiency of the multi-hop THz link by dynamically and jointly selecting the number of beam training levels across all the constituent single-hop links. To solve this problem in presence of unknown channel state information, noise, and path loss, a new reinforcement learning solution based on the multi-armed bandit (MAB) is developed. Simulation results show the fast convergence of the proposed scheme in presence of random channels and noise. The results also show that the proposed scheme can yield up to 75% performance gain, in terms of spectral efficiency, compared to the conventional hierarchical beam training with a fixed number of training levels.