Not enough data to create a plot.
Try a different view from the menu above.
Bouton, Maxime
Model Based Residual Policy Learning with Applications to Antenna Control
Möllerstedt, Viktor Eriksson, Russo, Alessio, Bouton, Maxime
Non-differentiable controllers and rule-based policies are widely used for controlling real systems such as telecommunication networks and robots. Specifically, parameters of mobile network base station antennas can be dynamically configured by these policies to improve users coverage and quality of service. Motivated by the antenna tilt control problem, we introduce Model-Based Residual Policy Learning (MBRPL), a practical reinforcement learning (RL) method. MBRPL enhances existing policies through a model-based approach, leading to improved sample efficiency and a decreased number of interactions with the actual environment when compared to off-the-shelf RL methods.To the best of our knowledge, this is the first paper that examines a model-based approach for antenna control. Experimental results reveal that our method delivers strong initial performance while improving sample efficiency over previous RL methods, which is one step towards deploying these algorithms in real networks.
Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna Tuning
Bouton, Maxime, Jeong, Jaeseong, Outes, Jose, Mendo, Adriano, Nikou, Alexandros
Future generations of mobile networks are expected to contain more and more antennas with growing complexity and more parameters. Optimizing these parameters is necessary for ensuring the good performance of the network. The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies. Reinforcement learning is a promising technique to address this challenge but existing methods often use local optimizations to scale to large network deployments. We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally. By using a value decomposition approach, our algorithm can be trained from a global reward function instead of relying on an ad-hoc decomposition of the network performance across the different cells. The algorithm uses a graph neural network architecture which generalizes to different network topologies and learns coordination behaviors. We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
A Graph Attention Learning Approach to Antenna Tilt Optimization
Jin, Yifei, Vannella, Filippo, Bouton, Maxime, Jeong, Jaeseong, Hakim, Ezeddin Al
6G will move mobile networks towards increasing levels of complexity. To deal with this complexity, optimization of network parameters is key to ensure high performance and timely adaptivity to dynamic network environments. The optimization of the antenna tilt provides a practical and cost-efficient method to improve coverage and capacity in the network. Previous methods based on Reinforcement Learning (RL) have shown great promise for tilt optimization by learning adaptive policies outperforming traditional tilt optimization methods. However, most existing RL methods are based on single-cell features representation, which fails to fully characterize the agent state, resulting in suboptimal performance. Also, most of such methods lack scalability, due to state-action explosion, and generalization ability. In this paper, we propose a Graph Attention Q-learning (GAQ) algorithm for tilt optimization. GAQ relies on a graph attention mechanism to select relevant neighbors information, improve the agent state representation, and update the tilt control policy based on a history of observations using a Deep Q-Network (DQN). We show that GAQ efficiently captures important network information and outperforms standard DQN with local information by a large margin. In addition, we demonstrate its ability to generalize to network deployments of different sizes and densities.
Reinforcement Learning with Iterative Reasoning for Merging in Dense Traffic
Bouton, Maxime, Nakhaei, Alireza, Isele, David, Fujimura, Kikuo, Kochenderfer, Mykel J.
To avoid the computational requirements of online methods, we can use reinforcement learning (RL) instead. In RL, In recent years, major progress has been made to deploy the agent interacts with a simulation environment many autonomous vehicles and improve safety. However, certain times prior to execution, and at each simulation episode common driving situations like merging in dense traffic are it improves its strategy. The resulting policy can then be still challenging for autonomous vehicles. Situations like deployed online and is often inexpensive to evaluate. RL the one illustrated in Figure 1 often involve negotiating with provides a flexible framework to automatically find good human drivers.
Cooperation-Aware Reinforcement Learning for Merging in Dense Traffic
Bouton, Maxime, Nakhaei, Alireza, Fujimura, Kikuo, Kochenderfer, Mykel J.
Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks than with online planning methods.
Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments
Bouton, Maxime, Nakhaei, Alireza, Fujimura, Kikuo, Kochenderfer, Mykel J.
Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario.
Pedestrian Collision Avoidance System for Scenarios with Occlusions
Schratter, Markus, Bouton, Maxime, Kochenderfer, Mykel J., Watzenig, Daniel
Safe autonomous driving in urban areas requires robust algorithms to avoid collisions with other traffic participants with limited perception ability. Current deployed approaches relying on Autonomous Emergency Braking (AEB) systems are often overly conservative. In this work, we formulate the problem as a partially observable Markov decision process (POMDP), to derive a policy robust to uncertainty in the pedestrian location. We investigate how to integrate such a policy with an AEB system that operates only when a collision is unavoidable. In addition, we propose a rigorous evaluation methodology on a set of well defined scenarios. We show that combining the two approaches provides a robust autonomous braking system that reduces unnecessary braking caused by using the AEB system on its own.
Decomposition Methods with Deep Corrections for Reinforcement Learning
Bouton, Maxime, Julian, Kyle, Nakhaei, Alireza, Fujimura, Kikuo, Kochenderfer, Mykel J.
Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.
Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving
Bouton, Maxime, Karlsson, Jesper, Nakhaei, Alireza, Fujimura, Kikuo, Kochenderfer, Mykel J., Tumova, Jana
Designing reliable decision strategies for autonomous urban driving is challenging. Reinforcement learning (RL) has been used to automatically derive suitable behavior in uncertain environments, but it does not provide any guarantee on the performance of the resulting policy. We propose a generic approach to enforce probabilistic guarantees on an RL agent. An exploration strategy is derived prior to training that constrains the agent to choose among actions that satisfy a desired probabilistic specification expressed with linear temporal logic (LTL). Reducing the search space to policies satisfying the LTL formula helps training and simplifies reward design. This paper outlines a case study of an intersection scenario involving multiple traffic participants. The resulting policy outperforms a rule-based heuristic approach in terms of efficiency while exhibiting strong guarantees on safety.