Goto

Collaborating Authors

 Reinforcement Learning


Proximal Deterministic Policy Gradient

arXiv.org Machine Learning

This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algorithms to provide an improved action value estimate through bootstrapping with limited increase of computational resources. Further, we demonstrate significant performance improvement over state-of-the-art algorithms on standard continuous-control RL benchmarks.


The Need for Advanced Intelligence in NFV Management and Orchestration

arXiv.org Artificial Intelligence

With the constant demand for connectivity at an all-time high, Network Service Providers (NSPs) are required to optimize their networks to cope with rising capital and operational expenditures required to meet the growing connectivity demand. A solution to this challenge was presented through Network Function Virtualization (NFV). As network complexity increases and futuristic networks take shape, NSPs are required to incorporate an increasing amount of operational efficiency into their NFV-enabled networks. One such technique is Machine Learning (ML), which has been applied to various entities in NFV-enabled networks, most notably in the NFV Orchestrator. While traditional ML provides tremendous operational efficiencies, including real-time and high-volume data processing, challenges such as privacy, security, scalability, transferability, and concept drift hinder its widespread implementation. Through the adoption of Advanced Intelligence techniques such as Reinforcement Learning and Federated Learning, NSPs can leverage the benefits of traditional ML while simultaneously addressing the major challenges traditionally associated with it. This work presents the benefits of adopting these advanced techniques, provides a list of potential use cases and research topics, and proposes a bottom-up micro-functionality approach to applying these methods of Advanced Intelligence to NFV Management and Orchestration.


A Comparative Analysis of Deep Reinforcement Learning-enabled Freeway Decision-making for Automated Vehicles

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) is becoming a prevalent and powerful methodology to address the artificial intelligent problems. Owing to its tremendous potentials in self-learning and self-improvement, DRL is broadly serviced in many research fields. This article conducted a comprehensive comparison of multiple DRL approaches on the freeway decision-making problem for autonomous vehicles. These techniques include the common deep Q learning (DQL), double DQL (DDQL), dueling DQL, and prioritized replay DQL. First, the reinforcement learning (RL) framework is introduced. As an extension, the implementations of the above mentioned DRL methods are established mathematically. Then, the freeway driving scenario for the automated vehicles is constructed, wherein the decision-making problem is transferred as a control optimization problem. Finally, a series of simulation experiments are achieved to evaluate the control performance of these DRL-enabled decision-making strategies. A comparative analysis is realized to connect the autonomous driving results with the learning characteristics of these DRL techniques.


Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments. This prevents robust applications of RL in real world situations, where system dynamics may deviate wildly from the training settings. In this work, our primary contribution is to propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks; for the first time, we show that agents can generalize to test parameters more than 10 standard deviations away from the training parameter distribution. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving; it opens doors for the systematic study of generalization from training to extremely different testing settings, focusing on the established connections between information theory and machine learning.


AvE: Assistance via Empowerment

arXiv.org Artificial Intelligence

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing the human's ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment. This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state. We test our approach against assistance based on goal inference, highlighting scenarios where our method overcomes failure modes stemming from goal ambiguity or misspecification. As existing methods for estimating empowerment in continuous domains are computationally hard, precluding its use in real time learned assistance, we also propose an efficient empowerment-inspired proxy metric. Using this, we are able to successfully demonstrate our method in a shared autonomy user study for a challenging simulated teleoperation task with human-in-the-loop training.


Curriculum Learning with a Progression Function

arXiv.org Machine Learning

Curriculum Learning for Reinforcement Learning is an increasingly popular technique that involves training an agent on a defined sequence of intermediate tasks, called a Curriculum, to increase the agent's performance and learning speed. This paper introduces a novel paradigm for automatic curriculum generation based on a progression of task complexity. Different progression functions are introduced, including an autonomous online task progression based on the performance of the agent. The progression function also determines how long the agent should train on each intermediate task, which is an open problem in other task-based curriculum approaches. The benefits and wide applicability of our approach are shown by empirically comparing its performance to two state-of-the-art Curriculum Learning algorithms on a grid world and on a complex simulated navigation domain.


Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

arXiv.org Machine Learning

We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. While most existing works on actor-critic employ bi-level or two-timescale updates, we focus on the more practical single-timescale setting, where the actor and critic are updated simultaneously. Specifically, in each iteration, the critic update is obtained by applying the Bellman evaluation operator only once while the actor is updated in the policy gradient direction computed using the critic. Moreover, we consider two function approximation settings where both the actor and critic are represented by linear or deep neural networks. For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear $O(K^{-1/2})$ rate, where $K$ is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time. Moreover, under the broader scope of policy optimization with nonlinear function approximation, we prove that actor-critic with deep neural network finds the globally optimal policy at a sublinear rate for the first time.


AirCapRL: Autonomous Aerial Human Motion Capture using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

In this letter, we introduce a deep reinforcement learning (RL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap). We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose and shape of a single moving person using multiple micro aerial vehicles. State-of-the-art solutions to this problem are based on classical control methods, which depend on hand-crafted system and observation models. Such models are difficult to derive and generalize across different systems. Moreover, the non-linearity and non-convexities of these models lead to sub-optimal controls. In our work, we formulate this problem as a sequential decision making task to achieve the vision-based motion capture objectives, and solve it using a deep neural network-based RL method. We leverage proximal policy optimization (PPO) to train a stochastic decentralized control policy for formation control. The neural network is trained in a parallelized setup in synthetic environments. We performed extensive simulation experiments to validate our approach. Finally, real-robot experiments demonstrate that our policies generalize to real world conditions. Video Link: https://bit.ly/38SJfjo Supplementary: https://bit.ly/3evfo1O


V2I Connectivity-Based Dynamic Queue-Jump Lane for Emergency Vehicles: A Deep Reinforcement Learning Approach

arXiv.org Artificial Intelligence

Emergency vehicle (EMV) service is a key function of cities and is exceedingly challenging due to urban traffic congestion. A main reason behind EMV service delay is the lack of communication and cooperation between vehicles blocking EMVs. In this paper, we study the improvement of EMV service under V2I connectivity. We consider the establishment of dynamic queue jump lanes (DQJLs) based on real-time coordination of connected vehicles. We develop a novel Markov decision process formulation for the DQJL problem, which explicitly accounts for the uncertainty of drivers' reaction to approaching EMVs. We propose a deep neural network-based reinforcement learning algorithm that efficiently computes the optimal coordination instructions. We also validate our approach on a micro-simulation testbed using Simulation of Urban Mobility (SUMO). Validation results show that with our proposed methodology, the centralized control system saves approximately 15\% EMV passing time than the benchmark system.


Service Chain Composition with Failures in NFV Systems: A Game-Theoretic Perspective

arXiv.org Artificial Intelligence

For state-of-the-art network function virtualization (NFV) systems, it remains a key challenge to conduct effective service chain composition for different network services (NSs) with ultra-low request latencies and minimum network congestion. To this end, existing solutions often require full knowledge of the network state, while ignoring the privacy issues and overlooking the non-cooperative behaviors of users. What is more, they may fall short in the face of unexpected failures such as user unavailability and virtual machine breakdown. In this paper, we formulate the problem of service chain composition in NFV systems with failures as a non-cooperative game. By showing that such a game is a weighted potential game and exploiting the unique problem structure, we propose two effective distributed schemes that guide the service chain compositions of different NSs towards the Nash equilibrium (NE) state with both near-optimal latencies and minimum congestion. Besides, we develop two novel learning-aided schemes as comparisons, which are based on deep reinforcement learning (DRL) and Monte Carlo tree search (MCTS) techniques, respectively. Our theoretical analysis and simulation results demonstrate the effectiveness of our proposed schemes, as well as the adaptivity when faced with failures.