Search
Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy
Wang, Chen, Ji, Kaiyi, Geng, Junyi, Ren, Zhongqiang, Fu, Taimeng, Yang, Fan, Guo, Yifan, He, Haonan, Chen, Xiangyu, Zhan, Zitong, Du, Qiwei, Su, Shaoshu, Li, Bowen, Qiu, Yuheng, Du, Yi, Li, Qihang, Yang, Yifan, Lin, Xiao, Zhao, Zhipeng
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
Combining Neuroevolution with the Search for Novelty to Improve the Generation of Test Inputs for Games
Feldmeier, Patric, Fraser, Gordon
As games challenge traditional automated white-box test generators, the Neatest approach generates test suites consisting of neural networks that exercise the source code by playing the games. Neatest generates these neural networks using an evolutionary algorithm that is guided by an objective function targeting individual source code statements. This approach works well if the objective function provides sufficient guidance, but deceiving or complex fitness landscapes may inhibit the search. In this paper, we investigate whether the issue of challenging fitness landscapes can be addressed by promoting novel behaviours during the search. Our case study on two Scratch games demonstrates that rewarding novel behaviours is a promising approach for overcoming challenging fitness landscapes, thus enabling future research on how to adapt the search algorithms to best use this information.
Discovering symbolic expressions with parallelized tree search
Ruan, Kai, Gao, Ze-Feng, Guo, Yike, Sun, Hao, Wen, Ji-Rong, Liu, Yang
Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A grand challenge lies in the arduous search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. To this end, we introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data. Through a series of extensive experiments, we demonstrate the superior accuracy and efficiency of PTS for equation discovery, which greatly outperforms the state-of-the-art baseline models on over 80 synthetic and experimental datasets (e.g., lifting its performance by up to 99% accuracy improvement and one-order of magnitude speed up). PTS represents a key advance in accurate and efficient data-driven discovery of symbolic, interpretable models (e.g., underlying physical laws) and marks a pivotal transition towards scalable symbolic learning.
Efficient Materials Informatics between Rockets and Electrons
The true power of computational research typically can lay in either what it accomplishes or what it enables others to accomplish. In this work, both avenues are simultaneously embraced across several distinct efforts existing at three general scales of abstractions of what a material is - atomistic, physical, and design. At each, an efficient materials informatics infrastructure is being built from the ground up based on (1) the fundamental understanding of the underlying prior knowledge, including the data, (2) deployment routes that take advantage of it, and (3) pathways to extend it in an autonomous or semi-autonomous fashion, while heavily relying on artificial intelligence (AI) to guide well-established DFT-based ab initio and CALPHAD-based thermodynamic methods. The resulting multi-level discovery infrastructure is highly generalizable as it focuses on encoding problems to solve them easily rather than looking for an existing solution. To showcase it, this dissertation discusses the design of multi-alloy functionally graded materials (FGMs) incorporating ultra-high temperature refractory high entropy alloys (RHEAs) towards gas turbine and jet engine efficiency increase reducing CO2 emissions, as well as hypersonic vehicles. It leverages a new graph representation of underlying mathematical space using a newly developed algorithm based on combinatorics, not subject to many problems troubling the community. Underneath, property models and phase relations are learned from optimized samplings of the largest and highest quality dataset of HEA in the world, called ULTERA. At the atomistic level, a data ecosystem optimized for machine learning (ML) from over 4.5 million relaxed structures, called MPDD, is used to inform experimental observations and improve thermodynamic models by providing stability data enabled by a new efficient featurization framework.
Augmented Bayesian Policy Search
Kallel, Mahdi, Basu, Debabrota, Akrour, Riad, D'Eramo, Carlo
Deterministic policies are often preferred over stochastic ones when implemented on physical systems. They can prevent erratic and harmful behaviors while being easier to implement and interpret. However, in practice, exploration is largely performed by stochastic policies. First-order Bayesian Optimization (BO) methods offer a principled way of performing exploration using deterministic policies. This is done through a learned probabilistic model of the objective function and its gradient. Nonetheless, such approaches treat policy search as a black-box problem, and thus, neglect the reinforcement learning nature of the problem. In this work, we leverage the performance difference lemma to introduce a novel mean function for the probabilistic model. This results in augmenting BO methods with the action-value function. Hence, we call our method Augmented Bayesian Search~(ABS). Interestingly, this new mean function enhances the posterior gradient with the deterministic policy gradient, effectively bridging the gap between BO and policy gradient methods. The resulting algorithm combines the convenience of the direct policy search with the scalability of reinforcement learning. We validate ABS on high-dimensional locomotion problems and demonstrate competitive performance compared to existing direct policy search schemes.
DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems
Yu, Kexiong, Zhao, Hang, Huang, Yuhang, Yi, Renjiao, Xu, Kai, Zhu, Chenyang
Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research has pivoted towards diffusion models, they require simulating a Markov chain with many steps to produce a sample, which is time-consuming and does not meet the efficiency requirement of real applications, especially at scale. We propose DISCO, an efficient DIffusion Solver for Combinatorial Optimization problems that excels in both solution quality and inference speed. DISCO's efficacy is two-pronged: Firstly, it achieves rapid denoising of solutions through an analytically solvable form, allowing for direct sampling from the solution space with very few reverse-time steps, thereby drastically reducing inference time. Secondly, DISCO enhances solution quality by restricting the sampling space to a more constrained, meaningful domain guided by solution residues, while still preserving the inherent multi-modality of the output probabilistic distributions. DISCO achieves state-of-the-art results on very large Traveling Salesman Problems with 10000 nodes and challenging Maximal Independent Set benchmarks, with its per-instance denoising time up to 44.8 times faster. Through further combining a divide-and-conquer strategy, DISCO can be generalized to solve arbitrary-scale problem instances off the shelf, even outperforming models trained specifically on corresponding scales.
Ariadne and Theseus: Exploration and Rendezvous with Two Mobile Agents in an Unknown Graph
We investigate two fundamental problems in mobile computing: exploration and rendezvous, with two distinct mobile agents in an unknown graph. The agents may communicate by reading and writing information on whiteboards that are located at all nodes. They both move along one adjacent edge at every time-step. In the exploration problem, the agents start from the same arbitrary node and must traverse all the edges. We present an algorithm achieving collective exploration in $m$ time-steps, where $m$ is the number of edges of the graph. This improves over the guarantee of depth-first search, which requires $2m$ time-steps. In the rendezvous problem, the agents start from different nodes of the graph and must meet as fast as possible. We present an algorithm guaranteeing rendezvous in at most $\frac{3}{2}m$ time-steps. This improves over the so-called `wait for Mommy' algorithm which is based on depth-first search and which also requires $2m$ time-steps. Importantly, all our guarantees are derived from a more general asynchronous setting in which the speeds of the agents are controlled by an adversary at all times. Our guarantees generalize to weighted graphs, when replacing the number of edges $m$ with the sum of all edge lengths. We show that our guarantees are met with matching lower-bounds in the asynchronous setting.
Dancing to the State of the Art? How Candidate Lists Influence LKH for Solving the Traveling Salesperson Problem
Heins, Jonathan, Schäpermeier, Lennart, Kerschke, Pascal, Whitley, Darrell
Solving the Traveling Salesperson Problem (TSP) remains a persistent challenge, despite its fundamental role in numerous generalized applications in modern contexts. Heuristic solvers address the demand for finding high-quality solutions efficiently. Among these solvers, the Lin-Kernighan-Helsgaun (LKH) heuristic stands out, as it complements the performance of genetic algorithms across a diverse range of problem instances. However, frequent timeouts on challenging instances hinder the practical applicability of the solver. Within this work, we investigate a previously overlooked factor contributing to many timeouts: The use of a fixed candidate set based on a tree structure. Our investigations reveal that candidate sets based on Hamiltonian circuits contain more optimal edges. We thus propose to integrate this promising initialization strategy, in the form of POPMUSIC, within an efficient restart version of LKH. As confirmed by our experimental studies, this refined TSP heuristic is much more efficient - causing fewer timeouts and improving the performance (in terms of penalized average runtime) by an order of magnitude - and thereby challenges the state of the art in TSP solving.
Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents
We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games, and demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms. Autoverse uses cellular-automaton-like rewrite rules to describe game mechanics, allowing it to express various game environments (e.g. mazes, dungeons, sokoban puzzles) that are popular testbeds for Reinforcement Learning (RL) agents. Each rewrite rule can be expressed as a series of simple convolutions, allowing for environments to be parallelized on the GPU, thereby drastically accelerating RL training. Using Autoverse, we propose jump-starting open-ended learning by imitation learning from search. In such an approach, we first evolve Autoverse environments (their rules and initial map topology) to maximize the number of iterations required by greedy tree search to discover a new best solution, producing a curriculum of increasingly complex environments and playtraces. We then distill these expert playtraces into a neural-network-based policy using imitation learning. Finally, we use the learned policy as a starting point for open-ended RL, where new training environments are continually evolved to maximize the RL player agent's value function error (a proxy for its regret, or the learnability of generated environments), finding that this approach improves the performance and generality of resultant player agents.
Uncertainty-Guided Optimization on Large Language Model Search Trees
Grosse, Julia, Wu, Ruotian, Rashid, Ahmad, Hennig, Philipp, Poupart, Pascal, Kristiadi, Agustinus
Beam search is a standard tree search algorithm when it comes to finding sequences of maximum likelihood, for example, in the decoding processes of large language models. However, it is myopic since it does not take the whole path from the root to a leaf into account. Moreover, it is agnostic to prior knowledge available about the process: For example, it does not consider that the objective being maximized is a likelihood and thereby has specific properties, like being bound in the unit interval. Taking a probabilistic approach, we define a prior belief over the LLMs' transition probabilities and obtain a posterior belief over the most promising paths in each iteration. These beliefs are helpful to define a non-myopic Bayesian-optimization-like acquisition function that allows for a more data-efficient exploration scheme than standard beam search. We discuss how to select the prior and demonstrate in on- and off-model experiments with recent large language models, including Llama-2-7b, that our method achieves higher efficiency than beam search: Our method achieves the same or a higher likelihood while expanding fewer nodes than beam search.