We explore the use of deep learning and deep reinforcement learning for optimization problems in transportation. Many transportation system analysis tasks are formulated as an optimization problem - such as optimal control problems in intelligent transportation systems and long term urban planning. Often transportation models used to represent dynamics of a transportation system involve large data sets with complex input-output interactions and are difficult to use in the context of optimization. Use of deep learning metamodels can produce a lower dimensional representation of those relations and allow to implement optimization and reinforcement learning algorithms in an efficient manner. In particular, we develop deep learning models for calibrating transportation simulators and for reinforcement learning to solve the problem of optimal scheduling of travelers on the network.
Inefficient traffic signal control methods may cause numerous problems, such as traffic congestion and waste of energy. Reinforcement learning (RL) is a trending data-driven approach for adaptive traffic signal control in complex urban traffic networks. Although the development of deep neural networks (DNN) further enhances its learning capability, there are still some challenges in applying deep RLs to transportation networks with multiple signalized intersections, including non-stationarity environment, exploration-exploitation dilemma, multi-agent training schemes, continuous action spaces, etc. In order to address these issues, this paper first proposes a multi-agent deep deterministic policy gradient (MADDPG) method by extending the actor-critic policy gradient algorithms. MADDPG has a centralized learning and decentralized execution paradigm in which critics use additional information to streamline the training process, while actors act on their own local observations. The model is evaluated via simulation on the Simulation of Urban MObility (SUMO) platform. Model comparison results show the efficiency of the proposed algorithm in controlling traffic lights.
Electric vehicles have been rapidly increasing in usage, but stations to charge them have not always kept up with demand, so efficient routing of vehicles to stations is critical to operating at maximum efficiency. Deciding which stations to recommend drivers to is a complex problem with a multitude of possible recommendations, volatile usage patterns and temporally extended consequences of recommendations. Reinforcement learning offers a powerful paradigm for solving sequential decision-making problems, but traditional methods may struggle with sample efficiency due to the high number of possible actions. By developing a model that allows complex representations of actions, we improve outcomes for users of our system by over 30% when compared to existing baselines in a simulation. If implemented widely, these better recommendations can globally save over 4 million person-hours of waiting and driving each year.
A rail company organizes its fleet to accommodate expected demands, maximizing revenue and coverage, so that the service is provided to customers as far as possible. From a practical point of view, companies have to make decisions for two different time horizons: offline and online. Offline decisions deal with the problem of routing trains in advance, so that the basic path for each train is decided and in normal conditions, these are the one that will be followed. Decisions in this sense are made sporadically in a year, typically once every three to six months. The planned routes and schedules are usually hand-engineered according to regulation, safety measures, and demand requirements. As said, planned routes are the ones preferred in normal conditions, but this rarely happens since disruptions occur daily in the network. A broken train, a not working switch, delays in the preparation of the train, and many more real-life problems may affect the overall network. Sometimes the delay introduced is small and the planned schedule can still be used, but on other occasions, online rerouting and rescheduling have to be applied. In literature, this online decision making is called the Train Dispatching problem (TD), a real-time variant of the Train Timetabling problem (known to be NPhard ).