Undirected Networks
Dependability Analysis of Deep Reinforcement Learning based Robotics and Autonomous Systems
Dong, Yi, Zhao, Xingyu, Huang, Xiaowei
While Deep Reinforcement Learning (DRL) provides transformational capabilities to the control of Robotics and Autonomous Systems (RAS), the black-box nature of DRL and uncertain deployment-environments of RAS pose new challenges on its dependability. Although there are many existing works imposing constraints on the DRL policy to ensure a successful completion of the mission, it is far from adequate in terms of assessing the DRL-driven RAS in a holistic way considering all dependability properties. In this paper, we formally define a set of dependability properties in temporal logic and construct a Discrete-Time Markov Chain (DTMC) to model the dynamics of risk/failures of a DRL-driven RAS interacting with the stochastic environment. We then do Probabilistic Model Checking based on the designed DTMC to verify those properties. Our experimental results show that the proposed method is effective as a holistic assessment framework, while uncovers conflicts between the properties that may need trade-offs in the training. Moreover, we find the standard DRL training cannot improve dependability properties, thus requiring bespoke optimisation objectives concerning them. Finally, our method offers a novel dependability analysis to the Sim-to-Real challenge of DRL.
Deep hierarchical reinforcement agents for automated penetration testing
Tran, Khuong, Akella, Ashlesha, Standen, Maxwell, Kim, Junae, Bowman, David, Richer, Toby, Lin, Chin-Teng
Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents called HA-DRL, which employs an algebraic action decomposition strategy to address the large discrete action space of an autonomous penetration testing simulator where the number of actions is exponentially increased with the complexity of the designed cybersecurity network. The proposed architecture is shown to find the optimal attacking policy faster and more stably than a conventional deep Q-learning agent which is commonly used as a method to apply artificial intelligence in automatic penetration testing.
On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference
Baioumy, Mohamed, Lacerda, Bruno, Duckworth, Paul, Hawes, Nick
We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) as probabilistic inference. Furthermore, we discuss online and offline methods for planning under uncertainty. In an SSP MDP, the horizon is indefinite and unknown a priori. SSP MDPs generalize finite and infinite horizon MDPs and are widely used in the artificial intelligence community. Additionally, we highlight some of the differences between solving an MDP using dynamic programming approaches widely used in the artificial intelligence community and approaches used in the active inference community.
Recommendation Fairness: From Static to Dynamic
Driven by the need to capture users' evolving interests and optimize their long-term experiences, more and more recommender systems have started to model recommendation as a Markov decision process and employ reinforcement learning to address the problem. Shouldn't research on the fairness of recommender systems follow the same trend from static evaluation and one-shot intervention to dynamic monitoring and non-stop control? In this paper, we portray the recent developments in recommender systems first and then discuss how fairness could be baked into the reinforcement learning techniques for recommendation. Moreover, we argue that in order to make further progress in recommendation fairness, we may want to consider multi-agent (game-theoretic) optimization, multi-objective (Pareto) optimization, and simulation-based optimization, in the general framework of stochastic games.
Beginners Guide to Boltzmann Machine
Deep learning implements structured machine learning algorithms by making use of artificial neural networks. These algorithms help the machine to learn by itself and develop the ability to establish new parameters with which help to make and execute decisions. Deep learning is considered to be a subset of machine learning and utilizes multi-layered artificial neural networks to carry out its processes, which enables it to deliver high accuracy in tasks such as speech recognition, object detection, language translation and other such modern use cases being implemented every day. One of the most intriguing implementations in the domain of artificial intelligence for creating deep learning models has been the Boltzmann Machine. In this article, we will try to understand what exactly a Boltzmann Machine is, how it can be implemented and its uses.
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Agarwal, Mridul, Bai, Qinbo, Aggarwal, Vaneet
We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. Various learning applications with constraints, such as robotics, do not allow for policies that can violate constraints. To this end, we propose a model-based learning algorithm that achieves zero constraint violations. To obtain this result, we assume that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures. We then solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We also propose a novel Bellman error based analysis for tabular infinite-horizon setups which allows to analyse stochastic policies. Combining the Bellman error based analysis and tighter optimization equation, for $T$ interactions with the environment, we obtain a regret guarantee for objective which grows as $\Tilde{O}(1/\sqrt{T})$, excluding other factors.
Improved Algorithms for Misspecified Linear Markov Decision Processes
Vial, Daniel, Parulekar, Advait, Shakkottai, Sanjay, Srikant, R.
Due to the large (possibly infinite) state spaces of modern reinforcement learning applications, practical algorithms must generalize across states. To understand generalization on a theoretical level, recent work has studied linear Markov decision processes (LMDPs), among other models (see Section 1.2 for related work). The LMDP model assumes the next-state distribution and reward are linear in known d-dimensional features, which enables tractable generalization when d is small. Of course, this linear assumption most likely fails in practice, which motivates the misspecified LMDP (MLMDP) model.
AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations
Ruan, Hongtao, Zheng, Liying, Hu, Peixian, Xu, Liang, Xiao, Jing
State-of-the-art Named Entity Recognition(NER) models rely heavily on large amountsof fully annotated training data. However, ac-cessible data are often incompletely annotatedsince the annotators usually lack comprehen-sive knowledge in the target domain. Normallythe unannotated tokens are regarded as non-entities by default, while we underline thatthese tokens could either be non-entities orpart of any entity. Here, we study NER mod-eling with incomplete annotated data whereonly a fraction of the named entities are la-beled, and the unlabeled tokens are equiva-lently multi-labeled by every possible label.Taking multi-labeled tokens into account, thenumerous possible paths can distract the train-ing model from the gold path (ground truthlabel sequence), and thus hinders the learn-ing ability. In this paper, we propose AdaK-NER, named the adaptive top-Kapproach, tohelp the model focus on a smaller feasible re-gion where the gold path is more likely to belocated. We demonstrate the superiority ofour approach through extensive experimentson both English and Chinese datasets, aver-agely improving 2% in F-score on the CoNLL-2003 and over 10% on two Chinese datasetscompared with the prior state-of-the-art works.
Simultaneous Perception-Action Design via Invariant Finite Belief Sets
Hibbard, Michael, Tanaka, Takashi, Topcu, Ufuk
Although perception is an increasingly dominant portion of the overall computational cost for autonomous systems, only a fraction of the information perceived is likely to be relevant to the current task. To alleviate these perception costs, we develop a novel simultaneous perception-action design framework wherein an agent senses only the task-relevant information. This formulation differs from that of a partially observable Markov decision process, since the agent is free to synthesize not only its policy for action selection but also its belief-dependent observation function. The method enables the agent to balance its perception costs with those incurred by operating in its environment. To obtain a computationally tractable solution, we approximate the value function using a novel method of invariant finite belief sets, wherein the agent acts exclusively on a finite subset of the continuous belief space. We solve the approximate problem through value iteration in which a linear program is solved individually for each belief state in the set, in each iteration. Finally, we prove that the value functions, under an assumption on their structure, converge to their continuous state-space values as the sample density increases.
Event-Based Communication in Multi-Agent Distributed Q-Learning
Ornia, Daniel Jarne, Mazo, Manuel Jr
We present in this work an approach to reduce the communication of information needed on a multi-agent learning system inspired by Event Triggered Control (ETC) techniques. We consider a baseline scenario of a distributed Q-learning problem on a Markov Decision Process (MDP). Following an event-based approach, N agents explore the MDP and communicate experiences to a central learner only when necessary, which performs updates of the actor Q functions. We analyse the convergence guarantees retained with respect to a regular Q-learning algorithm, and present experimental results showing that event-based communication results in a substantial reduction of data transmission rates in such distributed systems. Additionally, we discuss what effects (desired and undesired) these event-based approaches have on the learning processes studied, and how they can be applied to more complex multi-agent learning systems.