Markov Models
Machine Learning for Reliability Engineering and Safety Applications: Review of Current Status and Future Opportunities
Xu, Zhaoyi, Saleh, Joseph Homer
Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhelming to navigate and integrate into a coherent whole. In this work, we facilitate this task by providing a synthesis of, and a roadmap to this ever-expanding analytical landscape and highlighting its major landmarks and pathways. We first provide an overview of the different ML categories and sub-categories or tasks, and we note several of the corresponding models and algorithms. We then look back and review the use of ML in reliability and safety applications. We examine several publications in each category/sub-category, and we include a short discussion on the use of Deep Learning to highlight its growing popularity and distinctive advantages. Finally, we look ahead and outline several promising future opportunities for leveraging ML in service of advancing reliability and safety considerations. Overall, we argue that ML is capable of providing novel insights and opportunities to solve important challenges in reliability and safety applications. It is also capable of teasing out more accurate insights from accident datasets than with traditional analysis tools, and this in turn can lead to better informed decision-making and more effective accident prevention.
ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation
Xia, Fei, Li, Chengshu, Martรญn-Martรญn, Roberto, Litany, Or, Toshev, Alexander, Savarese, Silvio
Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners, we can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space. We propose ReLMoGen -- a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base. These problems are challenging because they are usually long-horizon, hard to explore during training, and comprise alternating phases of navigation and interaction. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. In all settings, ReLMoGen outperforms state-of-the-art Reinforcement Learning and Hierarchical Reinforcement Learning baselines. ReLMoGen also shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
On the Sample Complexity of Reinforcement Learning with Policy Space Generalization
Mou, Wenlong, Wen, Zheng, Chen, Xi
We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which are intractably large in many practical problems. To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP). Using a simulator oracle, we prove a near-optimal sample complexity upper bound that only depends linearly on the eluder dimension. We further prove a similar regret bound in deterministic systems without the simulator.
MIDAS: Multi-agent Interaction-aware Decision-making with Adaptive Strategies for Urban Autonomous Navigation
Chen, Xiaoyi, Chaudhari, Pratik
Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonable, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a ''driver-type'' parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making.
Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification
Mei, Hongyuan, Qin, Guanghui, Xu, Minjie, Eisner, Jason
Learning how to predict future events from patterns of past events is difficult when the set of possible event types is large. Training an unrestricted neural model might overfit to spurious patterns. To exploit domain-specific knowledge of how past events might affect an event's present probability, we propose using a temporal deductive database to track structured facts over time. Rules serve to prove facts from other facts and from past events. Each fact has a time-varying state---a vector computed by a neural net whose topology is determined by the fact's provenance, including its experience of past events. The possible event types at any time are given by special facts, whose probabilities are neurally modeled alongside their states. In both synthetic and real-world domains, we show that neural probabilistic models derived from concise Datalog programs improve prediction by encoding appropriate domain knowledge in their architecture.
Safe Reinforcement Learning in Constrained Markov Decision Processes
Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. Specifically, we take a stepwise approach for optimizing safety and cumulative reward. In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward under proper regularity assumptions. In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openly-available environment named GP-SAFETY-GYM, and the other simulates Mars surface exploration by using real observation data.
Surveillance of COVID-19 Pandemic using Hidden Markov Model
Prabhu, Shreekanth M., Subramaniam, Natarajan
COVID-19 pandemic has brought the whole world to a stand-still over the last few months. In particular the pace at which pandemic has spread has taken everybody off-guard. The Governments across the world have responded by imposing lock-downs, stopping/restricting travel and mandating social distancing. On the positive side there is wide availability of information on active cases, recoveries and deaths collected daily across regions. However, what has been particularly challenging is to track the spread of the disease by asymptomatic carriers termed as super-spreaders. In this paper we look at applying Hidden Markov Model to get a better assessment of extent of spread. The outcome of such analysis can be useful to Governments to design the required interventions/responses in a calibrated manner. The data we have chosen to analyze pertains to Indian scenario.
The Projected Belief Network Classfier : both Generative and Discriminative
The projected belief network (PBN) is a layered generative network with tractable likelihood function, and is based on a feed-forward neural network (FF-NN). It can therefore share an embodiment with a discriminative classifier and can inherit the best qualities of both types of network. In this paper, a convolutional PBN is constructed that is both fully discriminative and fully generative and is tested on spectrograms of spoken commands. It is shown that the network displays excellent qualities from either the discriminative or generative viewpoint. Random data synthesis and visible data reconstruction from low-dimensional hidden variables are shown, while classifier performance approaches that of a regularized discriminative network. Combination with a conventional discriminative CNN is also demonstrated.
Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions
Geng, Sinong, Nassif, Houssam, Manzanares, Carlos A., Reppen, A. Max, Sircar, Ronnie
We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies. We name our method PQR, as it sequentially estimates the Policy, the $Q$-function, and the Reward function by deep learning. PQR does not assume that the reward solely depends on the state, instead it allows for a dependency on the choice of action. Moreover, PQR allows for stochastic state transitions. To accomplish this, we assume the existence of one anchor action whose reward is known, typically the action of doing nothing, yielding no reward. We present both estimators and algorithms for the PQR method. When the environment transition is known, we prove that the PQR reward estimator uniquely recovers the true reward. With unknown transitions, we bound the estimation error of PQR. Finally, the performance of PQR is demonstrated by synthetic and real-world datasets.
Deceptive Kernel Function on Observations of Discrete POMDP
This paper studies the deception applied on agent in a partially observable Markov decision process. We introduce deceptive kernel function (the kernel) applied to agent's observations in a discrete POMDP. Based on value iteration, value function approximation and POMCP three characteristic algorithms used by agent, we analyze its belief being misled by falsified observations as the kernel's outputs and anticipate its probable threat on agent's reward and potentially other performance. We validate our expectation and explore more detrimental effects of the deception by experimenting on two POMDP problems. The result shows that the kernel applied on agent's observation can affect its belief and substantially lower its resulting rewards; meantime certain implementation of the kernel could induce other abnormal behaviors by the agent.