Search
From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms
Zheng, Weijie, Doerr, Benjamin
Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization
Zhou, Runlong, He, Zelin, Tian, Yuandong, Wu, Yi, Du, Simon S.
Over the recent years, reinforcement learning (RL) starts to show promising results in tackling combinatorial optimization (CO) problems, in particular when coupled with curriculum learning to facilitate training. Despite emerging empirical evidence, theoretical study on why RL helps is still at its early stage. This paper presents the first systematic study on policy optimization methods for online CO problems. We show that online CO problems can be naturally formulated as latent Markov Decision Processes (LMDPs), and prove convergence bounds on natural policy gradient (NPG) for solving LMDPs. Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem. For a canonical online CO problem, the Best Choice Problem (BCP), we formally prove that distribution shift is reduced exponentially with curriculum learning even if the curriculum is a randomly generated BCP on a smaller scale. Our theory also shows we can simplify the curriculum learning scheme used in prior work from multi-step to single-step. Lastly, we provide extensive experiments on the Best Choice Problem, Online Knapsack, and AdWords to verify our findings.
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
He, Jiafan, Zhao, Heyang, Zhou, Dongruo, Gu, Quanquan
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
Parting with Misconceptions about Learning-based Vehicle Motion Planning
Dauner, Daniel, Hallgarten, Marcel, Geiger, Andreas, Chitta, Kashyap
The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term planning and long-horizon ego-forecasting. Existing systems struggle to simultaneously meet both requirements. Indeed, we find that these tasks are fundamentally misaligned and should be addressed independently. We further assess the current state of closed-loop planning in the field, revealing the limitations of learning-based methods in complex real-world scenarios and the value of simple rule-based priors such as centerline selection through lane graph search algorithms. More surprisingly, for the open-loop sub-task, we observe that the best results are achieved when using only this centerline as scene context (i.e., ignoring all information regarding the map and other agents). Combining these insights, we propose an extremely simple and efficient planner which outperforms an extensive set of competitors, winning the nuPlan planning challenge 2023.
An Integrated Framework Integrating Monte Carlo Tree Search and Supervised Learning for Train Timetabling Problem
The single-track railway train timetabling problem (TTP) is an important and complex problem. This article proposes an integrated Monte Carlo Tree Search (MCTS) computing framework that combines heuristic methods, unsupervised learning methods, and supervised learning methods for solving TTP in discrete action spaces. This article first describes the mathematical model and simulation system dynamics of TTP, analyzes the characteristics of the solution from the perspective of MCTS, and proposes some heuristic methods to improve MCTS. This article considers these methods as planners in the proposed framework. Secondly, this article utilizes deep convolutional neural networks to approximate the value of nodes and further applies them to the MCTS search process, referred to as learners. The experiment shows that the proposed heuristic MCTS method is beneficial for solving TTP; The algorithm framework that integrates planners and learners can improve the data efficiency of solving TTP; The proposed method provides a new paradigm for solving TTP.
Combining Language Models For Specialized Domains: A Colorful Approach
Eitan, Daniel, Pirchi, Menachem, Glazer, Neta, Meital, Shai, Ayach, Gil, Krendel, Gidon, Shamsian, Aviv, Navon, Aviv, Hetz, Gil, Keshet, Joseph
General purpose language models (LMs) encounter difficulties when processing domain-specific jargon and terminology, which are frequently utilized in specialized fields such as medicine or industrial settings. Moreover, they often find it challenging to interpret mixed speech that blends general language with specialized jargon. This poses a challenge for automatic speech recognition systems operating within these specific domains. In this work, we introduce a novel approach that integrates domain-specific or secondary LM into general-purpose LM. This strategy involves labeling, or "coloring", each word to indicate its association with either the general or the domain-specific LM. We develop an optimized algorithm that enhances the beam search algorithm to effectively handle inferences involving colored words. Our evaluations indicate that this approach is highly effective in integrating jargon into language tasks. Notably, our method substantially lowers the error rate for domain-specific words without compromising performance in the general domain.
Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization
Shen, Wei, Huang, Minhui, Zhang, Jiawei, Shen, Cong
In recent years, federated minimax optimization has attracted growing interest due to its extensive applications in various machine learning tasks. While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved its success in centralized nonconvex minimax optimization, how and whether smoothing technique could be helpful in federated setting remains unexplored. In this paper, we propose a new algorithm termed Federated Stochastic Smoothed Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for federated minimax optimization. We prove that FESS-GDA can be uniformly used to solve several classes of federated minimax problems and prove new or better analytical convergence results for these settings. We showcase the practical efficiency of FESS-GDA in practical federated learning tasks of training generative adversarial networks (GANs) and fair classification.
A Neurodiversity-Inspired Solver for the Abstraction \& Reasoning Corpus (ARC) Using Visual Imagery and Program Synthesis
Ainooson, James, Sanyal, Deepayan, Michelson, Joel P., Yang, Yuan, Kunda, Maithilee
Core knowledge about physical objects -- e.g., their permanency, spatial transformations, and interactions -- is one of the most fundamental building blocks of biological intelligence across humans and non-human animals. While AI techniques in certain domains (e.g. vision, NLP) have advanced dramatically in recent years, no current AI systems can yet match human abilities in flexibly applying core knowledge to solve novel tasks. We propose a new AI approach to core knowledge that combines 1) visual representations of core knowledge inspired by human mental imagery abilities, especially as observed in studies of neurodivergent individuals; with 2) tree-search-based program synthesis for flexibly combining core knowledge to form new reasoning strategies on the fly. We demonstrate our system's performance on the very difficult Abstraction \& Reasoning Corpus (ARC) challenge, and we share experimental results from publicly available ARC items as well as from our 4th-place finish on the private test set during the 2022 global ARCathon challenge.
Requirement falsification for cyber-physical systems using generative models
Peltomäki, Jarkko, Porres, Ivan
We present the OGAN algorithm for automatic requirement falsification of cyber-physical systems. System inputs and output are represented as piecewise constant signals over time while requirements are expressed in signal temporal logic. OGAN can find inputs that are counterexamples for the safety of a system revealing design, software, or hardware defects before the system is taken into operation. The OGAN algorithm works by training a generative machine learning model to produce such counterexamples. It executes tests atomically and does not require any previous model of the system under test. We evaluate OGAN using the ARCH-COMP benchmark problems, and the experimental results show that generative models are a viable method for requirement falsification. OGAN can be applied to new systems with little effort, has few requirements for the system under test, and exhibits state-of-the-art CPS falsification efficiency and effectiveness.
Learning to Play Chess from Textbooks (LEAP): a Corpus for Evaluating Chess Moves based on Sentiment Analysis
Alrdahi, Haifa, Batista-Navarro, Riza
Learning chess strategies has been investigated widely, with most studies focussing on learning from previous games using search algorithms. Chess textbooks encapsulate grandmaster knowledge, explain playing strategies and require a smaller search space compared to traditional chess agents. This paper examines chess textbooks as a new knowledge source for enabling machines to learn how to play chess -- a resource that has not been explored previously. We developed the LEAP corpus, a first and new heterogeneous dataset with structured (chess move notations and board states) and unstructured data (textual descriptions) collected from a chess textbook containing 1164 sentences discussing strategic moves from 91 games. We firstly labelled the sentences based on their relevance, i.e., whether they are discussing a move. Each relevant sentence was then labelled according to its sentiment towards the described move. We performed empirical experiments that assess the performance of various transformer-based baseline models for sentiment analysis. Our results demonstrate the feasibility of employing transformer-based sentiment analysis models for evaluating chess moves, with the best performing model obtaining a weighted micro F_1 score of 68%. Finally, we synthesised the LEAP corpus to create a larger dataset, which can be used as a solution to the limited textual resource in the chess domain.