Goto

Collaborating Authors

 Chen, Rex


Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

arXiv.org Artificial Intelligence

ABSTRACT Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A controlled virtual experiment varying driver behavior and simulation scale finds evidence against distributional equivalence in RL-relevant measures from these simulators, with the root mean squared error and KL divergence being significantly greater than 0 for all assessed measures. While granular real-world validation generally remains infeasible, these findings suggest that traffic simulators are not a deus ex machina for RL training: understanding the impacts of inter-simulator differences is necessary to train and deploy RL-based ITSs. 1 INTRODUCTION Transportation efficiency is becoming an increasingly critical challenge due to continual growth in the volume of people and objects that need to be transported. The 2021 Urban Mobility Report (Schrank et al. 2021) projected that, while the COVID-19 pandemic alleviated congestion, traffic levels in the US will quickly rebound in areas with expanding populations and job markets to produce the most rapid congestion growth since 1982. The increased traffic will stress existing infrastructure and result in social, economic, and environmental costs (Schrank et al. 2021), thus making the development and deployment of intelligent transportation systems (ITSs) a critical priority. At the same time, advances in computational algorithms and roadway infrastructure made in response to these challenges provide opportunities to enhance ITS learning. For example, novel traffic signal control technologies based on reinforcement learning (RL), which learn adaptive signaling policies from simulations generated using real-world traffic data, have already achieved performance on par with and even exceeding traditional control methods (Chen et al. 2020). However, collecting data for ITS learning remains a nontrivial task.


UNSAT Solver Synthesis via Monte Carlo Forest Search

arXiv.org Artificial Intelligence

We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer program. MCFS algorithms can be seen as extensions of Monte Carlo Tree Search (MCTS) to cases where, rather than finding a good path (solution) within a tree, the problem is to find a small tree within a forest of candidate trees. We instantiate and evaluate our ideas in an algorithm that we dub Knuth Synthesis, an MCFS algorithm that learns DPLL branching policies for solving the Boolean satisfiability (SAT) problem, with the objective of achieving good average-case performance on a given distribution of unsatisfiable problem instances. Knuth Synthesis leverages two key ideas to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree. First, we estimate tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975). Second, we query a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus our policy search on early decisions that offer the greatest potential for reducing tree size. We matched or improved performance over a strong baseline on three well-known SAT distributions (R3SAT, sgen, satfc).