Goto

Collaborating Authors

 Zheng, Kan


Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part II: Control-Aware Radio Resource Allocation

arXiv.org Artificial Intelligence

In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward shaping and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.


Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control

arXiv.org Artificial Intelligence

An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multi-timescale control and communications (MTCC) based on Deep Reinforcement Learning (DRL). In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem. Then, we focus on the PC sub-problem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov Decision Process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of C-V2X communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.


Optimal Scheduling in IoT-Driven Smart Isolated Microgrids Based on Deep Reinforcement Learning

arXiv.org Artificial Intelligence

In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can generate real-time decisions based on observations of past renewable and load data of previous hours collected by connected sensors. The goal is to reduce operating cost on the premise of ensuring supply-demand balance. In specific, a novel finite-horizon partial observable Markov decision process (POMDP) model is conceived considering the spinning reserve. In order to overcome the challenge of discrete-continuous hybrid action space due to the binary DG switching decision and continuous energy dispatch (ED) decision, a DRL algorithm, namely the hybrid action finite-horizon RDPG (HAFH-RDPG), is proposed. HAFH-RDPG seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and recurrent deterministic policy gradient (RDPG), based on a finite-horizon dynamic programming (DP) framework. Extensive experiments are performed with real-world data in an IoT-driven MG to evaluate the capability of the proposed algorithm in handling the uncertainty due to inter-hour and inter-day power fluctuation and to compare its performance with those of the benchmark algorithms. J. Qi, L. Lei, and S. X. Yang are with the School of Engineering, University of Guelph, Guelph, ON N1G 2W1, Canada (e-mail: jiaju@uoguelph.ca; K. Zheng is with the College of Electrical Engineering and Computer Sciences, Ningbo University, Ningbo, 315211, China.


Joint Energy Dispatch and Unit Commitment in Microgrids Based on Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Nowadays, the application of microgrids (MG) with renewable energy is becoming more and more extensive, which creates a strong need for dynamic energy management. In this paper, deep reinforcement learning (DRL) is applied to learn an optimal policy for making joint energy dispatch (ED) and unit commitment (UC) decisions in an isolated MG, with the aim for reducing the total power generation cost on the premise of ensuring the supply-demand balance. In order to overcome the challenge of discrete-continuous hybrid action space due to joint ED and UC, we propose a DRL algorithm, i.e., the hybrid action finite-horizon DDPG (HAFH-DDPG), that seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and deep deterministic policy gradient (DDPG), based on a finite-horizon dynamic programming (DP) framework. Moreover, a diesel generator (DG) selection strategy is presented to support a simplified action space for reducing the computation complexity of this algorithm. Finally, the effectiveness of our proposed algorithm is verified through comparison with several baseline algorithms by experiments with real-world data set.


Federated Reinforcement Learning: Techniques, Applications, and Open Challenges

arXiv.org Artificial Intelligence

This paper presents a comprehensive survey of Federated Reinforcement Learning (FRL), an emerging and promising field in Reinforcement Learning (RL). Starting with a tutorial of Federated Learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e. Horizontal Federated Reinforcement Learning (HFRL) and Vertical Federated Reinforcement Learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.


Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges

arXiv.org Machine Learning

The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, which collect and share information to reflect the status of physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and ACS results in a new concept - autonomous IoT (AIoT). The sensors collect information on the system status, based on which intelligent agents in IoT devices as well as Edge/Fog/Cloud servers make control decisions for the actuators to react. In order to achieve autonomy, a promising method is for the intelligent agents to leverage the techniques in the field of artificial intelligence, especially reinforcement learning (RL) and deep reinforcement learning (DRL) for decision making. In this paper, we first provide comprehensive survey of the state-of-art research, and then propose a general model for the applications of RL/DRL in AIoT. Finally, the challenges and open issues for future research are identified.


Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing

arXiv.org Machine Learning

By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing. However, due to the resource constraint of IoT devices and wireless network, both the communications and computation resources need to be allocated and scheduled efficiently for better system performance. In this paper, we propose a joint computation offloading and multi-user scheduling algorithm for IoT edge computing system to minimize the long-term average weighted sum of delay and power consumption under stochastic traffic arrival. We formulate the dynamic optimization problem as an infinite-horizon average-reward continuous-time Markov decision process (CTMDP) model. One critical challenge in solving this MDP problem for the multi-user resource control is the curse-of-dimensionality problem, where the state space of the MDP model and the computation complexity increase exponentially with the growing number of users or IoT devices. In order to overcome this challenge, we use the deep reinforcement learning (RL) techniques and propose a neural network architecture to approximate the value functions for the post-decision system states. The designed algorithm to solve the CTMDP problem supports semi-distributed auction-based implementation, where the IoT devices submit bids to the BS to make the resource control decisions centrally. Simulation results show that the proposed algorithm provides significant performance improvement over the baseline algorithms, and also outperforms the RL algorithms based on other neural network architectures.


Short-term Road Traffic Prediction based on Deep Cluster at Large-scale Networks

arXiv.org Machine Learning

Short-term road traffic prediction (STTP) is one of the most important modules in Intelligent Transportation Systems (ITS). However, network-level STTP still remains challenging due to the difficulties both in modeling the diverse traffic patterns and tacking high-dimensional time series with low latency. Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper. The DeepCluster module is proposed to supervise the representation learning in a visualized way from the large unlabeled dataset. More specifically, to fully exploit the traffic periodicity, the raw series is first split into a number of sub-series for triplets generation. The convolutional neural networks (CNNs) with triplet loss are utilized to extract the features of shape by transferring the series into visual images. The shape-based representations are then used for road segments clustering. Thereafter, motivated by the fact that the road segments in a group have similar patterns, a model sharing strategy is further proposed to build recurrent NNs (RNNs)-based predictions through a group-based model (GM), instead of individual-based model (IM) in which one model are built for one road exclusively. Our framework can not only significantly reduce the number of models and cost, but also increase the number of training data and the diversity of samples. In the end, we evaluate the proposed framework over the network of Liuli Bridge in Beijing. Experimental results show that the DeepCluster can effectively cluster the road segments and GM can achieve comparable performance against the IM with less number of models.


A Driving Intention Prediction Method Based on Hidden Markov Model for Autonomous Driving

arXiv.org Machine Learning

In a mixed-traffic scenario where both autonomous vehicles and human-driving vehicles exist, a timely prediction of driving intentions of nearby human-driving vehicles is essential for the safe and efficient driving of an autonomous vehicle. In this paper, a driving intention prediction method based on Hidden Markov Model (HMM) is proposed for autonomous vehicles. HMMs representing different driving intentions are trained and tested with field collected data from a flyover. When training the models, either discrete or continuous characterization of the mobility features of vehicles is applied. Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions. Moreover, when the surrounding traffic of the vehicle is taken into account, the performances of the proposed prediction method are further improved.


An In-Vehicle KWS System with Multi-Source Fusion for Vehicle Applications

arXiv.org Machine Learning

Abstract--In order to maximize detection precision rate as well as the recall rate, this paper proposes an in-vehicle multisource fusionscheme in Keyword Spotting (KWS) System for vehicle applications. Vehicle information, as a new source for the original system, is collected by an in-vehicle data acquisition platform while the user is driving. A Deep Neural Network (DNN) is trained to extract acoustic features and make a speech classification. Based on the posterior probabilities obtained from DNN, the vehicle information including the speed and direction of vehicle is applied to choose the suitable parameter from a pair of sensitivity values for the KWS system. The experimental results show that the KWS system with the proposed multi-source fusion scheme can achieve better performances in term of precision rate, recall rate, and mean square error compared to the system without it. I. INTRODUCTION Keyword Spotting (KWS) System, also known as wakeword detection,refers to the task of detecting specified keyword from a continuous stream of audio provided by the users [1]. Keyword Spotting has been an active research area in speech recognition for decades, and widely used in numerous applications.