Undirected Networks
Deep Reinforcement Learning for Autonomous Driving: A Survey
Kiran, B Ravi, Sobh, Ibrahim, Talpaert, Victor, Mannion, Patrick, Sallab, Ahmad A. Al, Yogamani, Senthil, Pérez, Patrick
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms, provides a taxonomy of automated driving tasks where (D)RL methods have been employed, highlights the key challenges algorithmically as well as in terms of deployment of real world autonomous driving agents, the role of simulators in training agents, and finally methods to evaluate, test and robustifying existing solutions in RL and imitation learning.
Regret Minimization in Partially Observable Linear Quadratic Control
Lale, Sahin, Azizzadenesheli, Kamyar, Hassibi, Babak, Anandkumar, Anima
Controlling unknown discrete-time systems is a fundamenta l problem in adaptive control and reinforcement learning. In this problem, an agent interacts w ith an environment, with unknown dynamics, and aims to minimize the overall average regulati ng costs. To achieve this goal, the agent is required to explore the environment to gain a better understanding of the environment dynamics, which is often called system identification. The a gent then utilizes this understanding to design a set of improved controllers that simultaneously reduces the possible future costs and also enables the agent to explore the important and unknown a spects of the system. In recent decades, this challenging problem has been extensively stu died and resulted in a set of foundational steps to study the stability and asymptotic convergence to o ptimal controllers [Lai et al., 1982, Lai and Wei, 1987]. While asymptotic analyses set the ground for the design of optimal control, understanding the finite time behavior of adaptive algorith ms is critical for real-world applications. In practice, one might prefer an algorithm that guarantees b etter performance on a much shorter horizon. Recent developments in the fields of statistics and machine learning along with control theory [Van Der Vaart and Wellner, 1996, Peña et al., 2009, Lai et al., 1982] empowers us to not only advance the study of the asymptotic efficiency of algorithms b ut also to analyze their finite-time behavior [Fiechter, 1997, Abbasi-Yadkori and Szepesvári, 2011]. In partially observable linear quadratic control, if the ag ent, a priori, is handed the system dynamics, the optimal control/policy has a closed-form in t he presence of Gaussian disturbances.
Domain-Adversarial and -Conditional State Space Model for Imitation Learning
Okumura, Ryo, Okada, Masashi, Taniguchi, Tadahiro
State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning (IL). Without these states, IL is hampered by domain-dependent information useless for control. However, existing methods fail to remove such disturbances from the states when the data from experts and agents show large domain shifts. To overcome this issue, we propose a domain-adversarial and -conditional state space model (DAC-SSM) that enables control systems to obtain domain-agnostic and task- and dynamics-aware states. DAC-SSM jointly optimizes the state inference, observation reconstruction, forward dynamics, and reward models. To remove domain-dependent information from the states, the model is trained with domain discriminators in an adversarial manner, and the reconstruction is conditioned on domain labels. We experimentally evaluated the model predictive control performance via IL for continuous control of sparse reward tasks in simulators and compared it with the performance of the existing SRL method. The agents from DAC-SSM achieved performance comparable to experts and more than twice the baselines. We conclude domain-agnostic states are essential for IL that has large domain shifts and can be obtained using DAC-SSM.
Automated Deep Abstractions for Stochastic Chemical Reaction Networks
Predicting stochastic cellular dynamics as emerging from the mechanistic models of molecular interactions is a long-standing challenge in systems biology: low-level chemical reaction network (CRN) models give raise to a highly-dimensional continuous-time Markov chain (CTMC) which is computationally demanding and often prohibitive to analyse in practice. A recently proposed abstraction method uses deep learning to replace this CTMC with a discrete-time continuous-space process, by training a mixture density deep neural network with traces sampled at regular time intervals (which can obtained either by simulating a given CRN or as time-series data from experiment). The major advantage of such abstraction is that it produces a computational model that is dramatically cheaper to execute, while preserving the statistical features of the training data. In general, the abstraction accuracy improves with the amount of training data. However, depending on a CRN, the overall quality of the method -- the efficiency gain and abstraction accuracy -- will also depend on the choice of neural network architecture given by hyper-parameters such as the layer types and connections between them. As a consequence, in practice, the modeller would have to take care of finding the suitable architecture manually, for each given CRN, through a tedious and time-consuming trial-and-error cycle. In this paper, we propose to further automatise deep abstractions for stochastic CRNs, through learning the optimal neural network architecture along with learning the transition kernel of the abstract process. Automated search of the architecture makes the method applicable directly to any given CRN, which is time-saving for deep learning experts and crucial for non-specialists. We implement the method and demonstrate its performance on a number of representative CRNs with multi-modal emergent phenotypes.
Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles
Academic research in the field of autonomous vehicles has reached high popularity in recent years related to several topics as sensor technologies, V2X communications, safety, security, decision making, control, and even legal and standardization rules. Besides classic control design approaches, Artificial Intelligence and Machine Learning methods are present in almost all of these fields. Another part of research focuses on different layers of Motion Planning, such as strategic decisions, trajectory planning, and control. A wide range of techniques in Machine Learning itself have been developed, and this article describes one of these fields, Deep Reinforcement Learning (DRL). The paper provides insight into the hierarchical motion planning problem and describes the basics of DRL. The main elements of designing such a system are the modeling of the environment, the modeling abstractions, the description of the state and the perception models, the appropriate rewarding, and the realization of the underlying neural network. The paper describes vehicle models, simulation possibilities and computational requirements. Strategic decisions on different layers and the observation models, e.g., continuous and discrete state representations, grid-based, and camera-based solutions are presented. The paper surveys the state-of-art solutions systematized by the different tasks and levels of autonomous driving, such as car-following, lane-keeping, trajectory following, merging, or driving in dense traffic. Finally, open questions and future challenges are discussed.
Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
We study an extension of the classic stochastic multi-armed bandit problem which involves Markovian rewards and multiple plays. In order to tackle this problem we consider an index based adaptive allocation rule which at each stage combines calculations of sample means, and of upper confidence bounds, using the Kullback-Leibler divergence rate, for the stationary expected reward of Markovian arms. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish, asymptotically optimal, finite-time guarantees for the case of multiple plays, and IID rewards drawn from a one-parameter exponential family of probability densities.
Efficient Probabilistic Logic Reasoning with Graph Neural Networks
Zhang, Yuyu, Chen, Xinshi, Yang, Yuan, Ramamurthy, Arun, Li, Bo, Qi, Yuan, Song, Le
Markov Logic Networks (MLNs), which elegantly combine logic rules and probabilistic graphical models, can be used to address many knowledge graph problems. However, inference in MLN is computationally intensive, making the industrial-scale application of MLN very difficult. In recent years, graph neural networks (GNNs) have emerged as efficient and effective tools for large-scale graph problems. Nevertheless, GNNs do not explicitly incorporate prior logic rules into the models, and may require many labeled examples for a target task. In this paper, we explore the combination of MLNs and GNNs, and use graph neural networks for variational inference in MLN. We propose a GNN variant, named ExpressGNN, which strikes a nice balance between the representation power and the simplicity of the model. Our extensive experiments on several benchmark datasets demonstrate that ExpressGNN leads to effective and efficient probabilistic logic reasoning.
The Tensor Brain: Semantic Decoding for Perception and Memory
Tresp, Volker, Sharifzadeh, Sahand, Konopatzki, Dario, Ma, Yunpu
We analyse perception and memory using mathematical models for knowledge graphs and tensors to gain insights in the corresponding functionalities of the human mind. Our discussion is based on the concept of propositional sentences consisting of \textit{subject-predicate-object} (SPO) triples for expressing elementary facts. SPO sentences are the basis for most natural languages but might also be important for explicit perception and declarative memories, as well as intra-brain communication and the ability to argue and reason. A set of SPO sentences can be described as a knowledge graph, which can be transformed into an adjacency tensor. We introduce tensor models, where concepts have dual representations as indices and associated embeddings, two constructs we believe are essential for the understanding of implicit and explicit perception and memory in the brain. We argue that a biological realization of perception and memory imposes constraints on information processing. In particular, we propose that explicit perception and declarative memories require a semantic decoder, which, in a simple realization, is based on four layers: First, a sensory memory layer, as a buffer for sensory input, second, an index layer representing concepts, third, a memoryless representation layer for the broadcasting of information and fourth, a working memory layer as a processing center and data buffer. In a Bayesian brain interpretation, semantic memory defines the prior for triple statements. We propose that, in evolution and during development, semantic memory, episodic memory and natural language evolved as emergent properties in the agents' process to gain deeper understanding of sensory information. We present a concrete model realization and validate some aspects of our proposed model on benchmark data where we demonstrate state-of-the-art performance.
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
We consider the problem of off-policy evaluation for reinforcement learning, where the goal is to estimate the expected reward of a target policy $\pi$ using offline data collected by running a logging policy $\mu$. Standard importance-sampling based approaches for this problem suffer from a variance that scales exponentially with time horizon $H$, which motivates a splurge of recent interest in alternatives that break the "Curse of Horizon" (Liu et al. 2018, Xie et al. 2019). In particular, it was shown that a marginalized importance sampling (MIS) approach can be used to achieve an estimation error of order $O(H^3/ n)$ in mean square error (MSE) under an episodic Markov Decision Process model with finite states and potentially infinite actions. The MSE bound however is still a factor of $H$ away from a Cramer-Rao lower bound of order $\Omega(H^2/n)$. In this paper, we prove that with a simple modification to the MIS estimator, we can asymptotically attain the Cramer-Rao lower bound, provided that the action space is finite. We also provide a general method for constructing MIS estimators with high-probability error bounds.
Performance Analysis and Comparison of Machine and Deep Learning Algorithms for IoT Data Classification
Vakili, Meysam, Ghamsari, Mohammad, Rezaei, Masoumeh
In recent years, the growth of Internet of Things (IoT) as an emerging technology has been unbelievable. The number of networkenabled devices in IoT domains is increasing dramatically, leading to the massive production of electronic data. These data contain valuable information which can be used in various areas, such as science, industry, business and even social life. To extract and analyze this information and make IoT systems smart, the only choice is entering artificial intelligence (AI) world and leveraging the power of machine learning and deep learning techniques. This paper evaluates the performance of 11 popular machine and deep learning algorithms for classification task using six IoT-related datasets. These algorithms are compared according to several performance evaluation metrics including precision, recall, f1-score, accuracy, execution time, ROC-AUC score and confusion matrix. A specific experiment is also conducted to assess the convergence speed of developed models. The comprehensive experiments indicated that, considering all performance metrics, Random Forests performed better than other machine learning models, while among deep learning models, ANN and CNN achieved more interesting results.