Search
Graph-based SLAM-Aware Exploration with Prior Topo-Metric Information
Bai, Ruofei, Guo, Hongliang, Yau, Wei-Yun, Xie, Lihua
Autonomous exploration requires the robot to explore an unknown environment while constructing an accurate map with the SLAM (Simultaneous Localization and Mapping) techniques. Without prior information, the exploratory performance is usually conservative due to the limited planning horizon. This paper exploits a prior topo-metric graph of the environment to benefit both the exploration efficiency and the pose graph accuracy in SLAM. Based on recent advancements in relating pose graph reliability with graph topology, we are able to formulate both objectives into a SLAM-aware path planning problem over the prior graph, which finds a fast exploration path with informative loop closures that globally stabilize the pose graph. Furthermore, we derive theoretical thresholds to speed up the greedy algorithm to the problem, which significantly prune non-optimal loop closures in iterations. The proposed planner is incorporated into a hierarchical exploration framework, with flexible features including path replanning and online prior map update that adds additional information to the prior graph. Extensive experiments indicate that our method has comparable exploration efficiency to others while consistently maintaining higher mapping accuracy in various environments. Our implementations will be open-source on GitHub.
Individually Rational Collaborative Vehicle Routing through Give-And-Take Exchanges
Tang, Paul Mingzheng, Tran, Ba Phong, Lau, Hoong Chuin
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pairs of vehicles from different logistics companies, optimizing the overall routes while considering standard VRP constraints plus individual rationality constraints. By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide. More importantly, our approach ensures individual rationality and faster convergence, which are important properties of ensuring the long-term sustainability of the marketplace platform. We demonstrate the efficacy of our approach through extensive experiments using real-world test data from major logistics companies. The results reveal our algorithm's ability to rapidly identify numerous optimal solutions, underscoring its practical applicability and potential to transform the logistics industry.
Efficient and Explainable Graph Neural Architecture Search via Monte-Carlo Tree Search
Graph neural networks (GNNs) are powerful tools for performing data science tasks in various domains. Although we use GNNs in wide application scenarios, it is a laborious task for researchers and practitioners to design/select optimal GNN architectures in diverse graphs. To save human efforts and computational costs, graph neural architecture search (Graph NAS) has been used to search for a sub-optimal GNN architecture that combines existing components. However, there are no existing Graph NAS methods that satisfy explainability, efficiency, and adaptability to various graphs. Therefore, we propose an efficient and explainable Graph NAS method, called ExGNAS, which consists of (i) a simple search space that can adapt to various graphs and (ii) a search algorithm that makes the decision process explainable. The search space includes only fundamental functions that can handle homophilic and heterophilic graphs. The search algorithm efficiently searches for the best GNN architecture via Monte-Carlo tree search without neural models. The combination of our search space and algorithm achieves finding accurate GNN models and the important functions within the search space. We comprehensively evaluate our method compared with twelve hand-crafted GNN architectures and three Graph NAS methods in four graphs. Our experimental results show that ExGNAS increases AUC up to 3.6 and reduces run time up to 78\% compared with the state-of-the-art Graph NAS methods. Furthermore, we show ExGNAS is effective in analyzing the difference between GNN architectures in homophilic and heterophilic graphs.
TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search
Wen, Licheng, Fu, Ze, Cai, Pinlong, Fu, Daocheng, Mao, Song, Shi, Botian
Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adaptability. In this paper, we introduce TrafficMCTS, an innovative framework that harnesses the synergy of groupbased Monte Carlo tree search (MCTS) and Social Value Orientation (SVO) to engender a multifaceted traffic flow replete with varying driving styles and cooperative tendencies. Anchored by a closed-loop architecture, our framework enables vehicles to dynamically adapt to their environment in real time, and ensure feasible collision-free trajectories. Through comprehensive comparisons with state-of-the-art methods, we illuminate the advantages of our approach in terms of computational efficiency, planning success rate, intent completion time, and diversity metrics. Besides, we simulate highway and roundabout scenarios to illustrate the effectiveness of the proposed framework and highlight its ability to induce diverse social behaviors within the traffic flow. Finally, we validate the scalability of TrafficMCTS by showcasing its prowess in simultaneously mass vehicles within a sprawling road network, cultivating a landscape of traffic flow that mirrors the intricacies of human behavior.
RetroBridge: Modeling Retrosynthesis with Markov Bridges
Igashov, Ilia, Schneuing, Arne, Segler, Marwin, Bronstein, Michael, Correia, Bruno
Retrosynthesis planning is a fundamental challenge in chemistry which aims at designing reaction pathways from commercially available starting materials to a target molecule. Each step in multi-step retrosynthesis planning requires accurate prediction of possible precursor molecules given the target molecule and confidence estimates to guide heuristic search algorithms. We model single-step retrosynthesis planning as a distribution learning problem in a discrete state space. First, we introduce the Markov Bridge Model, a generative framework aimed to approximate the dependency between two intractable discrete distributions accessible via a finite sample of coupled data points. Our framework is based on the concept of a Markov bridge, a Markov process pinned at its endpoints. Unlike diffusion-based methods, our Markov Bridge Model does not need a tractable noise distribution as a sampling proxy and directly operates on the input product molecules as samples from the intractable prior distribution. We then address the retrosynthesis planning problem with our novel framework and introduce RetroBridge, a template-free retrosynthesis modeling approach that achieves state-of-the-art results on standard evaluation benchmarks.
Cyclophobic Reinforcement Learning
Wagner, Stefan Sylvius, Arndt, Peter, Robine, Jan, Harmeling, Stefan
In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks.
Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search
Maneuver decision-making can be regarded as a Markov decision process and can be address by reinforcement learning. However, original reinforcement learning algorithms can hardly solve the maneuvering decision-making problem. One reason is that agents use random actions in the early stages of training, which makes it difficult to get rewards and learn how to make effective decisions. To address this issue, a method based on proximal policy optimization and Monte Carlo tree search is proposed. The method uses proximal policy optimization to train the agent, and regards the results of air combat as targets to train the value network. Then, based on the value network and the visit count of each node, Monte Carlo tree search is used to find the actions with more expected returns than random actions, which can improve the training performance. The ablation studies and simulation experiments indicate that agents trained by the proposed method can make different decisions according to different states, which demonstrates that the method can solve the maneuvering decision problem that the original reinforcement learning algorithm cannot solve.
Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML
Nguyen, Giang, Biswas, Sumon, Rajan, Hridesh
Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based software. Previous studies have proposed bias mitigation algorithms that only work in specific situations and often result in a loss of accuracy. Our proposed solution is a novel approach that utilizes automated machine learning (AutoML) techniques to mitigate bias. Our approach includes two key innovations: a novel optimization function and a fairness-aware search space. By improving the default optimization function of AutoML and incorporating fairness objectives, we are able to mitigate bias with little to no loss of accuracy. Additionally, we propose a fairness-aware search space pruning method for AutoML to reduce computational cost and repair time. Our approach, built on the state-of-the-art Auto-Sklearn tool, is designed to reduce bias in real-world scenarios. In order to demonstrate the effectiveness of our approach, we evaluated our approach on four fairness problems and 16 different ML models, and our results show a significant improvement over the baseline and existing bias mitigation techniques. Our approach, Fair-AutoML, successfully repaired 60 out of 64 buggy cases, while existing bias mitigation techniques only repaired up to 44 out of 64 cases.
Invariant Lipschitz Bandits: A Side Observation Approach
Tran, Nam Phuong, Tran-Thanh, Long
Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities
Song, Yanjie, Wu, Yutong, Guo, Yangyang, Yan, Ran, Suganthan, P. N., Zhang, Yue, Pedrycz, Witold, Chen, Yingwu, Das, Swagatam, Mallipeddi, Rammohan, Ajani, Oladayo Solomon
Evolutionary algorithms (EA), a class of stochastic search methods based on the principles of natural evolution, have received widespread acclaim for their exceptional performance in various real-world optimization problems. While researchers worldwide have proposed a wide variety of EAs, certain limitations remain, such as slow convergence speed and poor generalization capabilities. Consequently, numerous scholars actively explore improvements to algorithmic structures, operators, search patterns, etc., to enhance their optimization performance. Reinforcement learning (RL) integrated as a component in the EA framework has demonstrated superior performance in recent years. This paper presents a comprehensive survey on integrating reinforcement learning into the evolutionary algorithm, referred to as reinforcement learning-assisted evolutionary algorithm (RL-EA). We begin with the conceptual outlines of reinforcement learning and the evolutionary algorithm. We then provide a taxonomy of RL-EA. Subsequently, we discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. The RL-assisted procedure is divided according to the implemented functions including solution generation, learnable objective function, algorithm/operator/sub-population selection, parameter adaptation, and other strategies. Finally, we analyze potential directions for future research. This survey serves as a rich resource for researchers interested in RL-EA as it overviews the current state-of-the-art and highlights the associated challenges. By leveraging this survey, readers can swiftly gain insights into RL-EA to develop efficient algorithms, thereby fostering further advancements in this emerging field.