Markov Models
Shared Information-Based Safe And Efficient Behavior Planning For Connected Autonomous Vehicles
Han, Songyang, Zhou, Shanglin, Pepin, Lynn, Wang, Jiangwei, Ding, Caiwen, Miao, Fei
The recent advancements in wireless technology enable connected autonomous vehicles (CAVs) to gather data via vehicle-to-vehicle (V2V) communication, such as processed LIDAR and camera data from other vehicles. In this work, we design an integrated information sharing and safe multi-agent reinforcement learning (MARL) framework for CAVs, to take advantage of the extra information when making decisions to improve traffic efficiency and safety. We first use weight pruned convolutional neural networks (CNN) to process the raw image and point cloud LIDAR data locally at each autonomous vehicle, and share CNN-output data with neighboring CAVs. We then design a safe actor-critic algorithm that utilizes both a vehicle's local observation and the information received via V2V communication to explore an efficient behavior planning policy with safety guarantees. Using the CARLA simulator for experiments, we show that our approach improves the CAV system's efficiency in terms of average velocity and comfort under different CAV ratios and different traffic densities. We also show that our approach avoids the execution of unsafe actions and always maintains a safe distance from other vehicles. We construct an obstacle-at-corner scenario to show that the shared vision can help CAVs to observe obstacles earlier and take action to avoid traffic jams.
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Kong, Fang, Zhang, Xiangcheng, Wang, Baoxiang, Li, Shuai
Reinforcement learning (RL) describes the interaction between a learning agent and an unknown environment, where the agent aims to maximize the cumulative reward through trial and error Sutton and Barto [2018]. It has achieved great success in many real applications, such as games [Mnih et al., 2013; Silver et al., 2016], robotics [Kober et al., 2013; Lillicrap et al., 2015], autonomous driving [Kiran et al., 2021] and recommendation systems [Afsar et al., 2022; Lin et al., 2021]. The interaction in RL is commonly portrayed by Markov decision processes (MDP). Most of the works study the stochastic setting, where the reward is sampled from a fixed distribution [Azar et al., 2017; Jin et al., 2018; Simchowitz and Jamieson, 2019; Yang et al., 2021]. RL in real applications is in general more challenging than the stochastic setting, as the environment could be nonstationary and the reward function could be adaptive towards the agent's policy. For example, a scheduling algorithm will be deployed to self-interested parties, and recommendation algorithms will face strategic users. To design robust algorithms that work under non-stationary environments, a line of works focuses on the adversarial setting, where the reward function could be arbitrarily chosen by an adversary [Yu et al., 2009; Rosenberg and Mansour, 2019; Jin et al., 2020a; Chen et al., 2021; Luo et al., 2021a].
A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers
Placed, Julio A., Strader, Jared, Carrillo, Henry, Atanasov, Nikolay, Indelman, Vadim, Carlone, Luca, Castellanos, Josรฉ A.
Active Simultaneous Localization and Mapping (SLAM) is the problem of planning and controlling the motion of a robot to build the most accurate and complete model of the surrounding environment. Since the first foundational work in active perception appeared, more than three decades ago, this field has received increasing attention across different scientific communities. This has brought about many different approaches and formulations, and makes a review of the current trends necessary and extremely valuable for both new and experienced researchers. In this work, we survey the state-of-the-art in active SLAM and take an in-depth look at the open challenges that still require attention to meet the needs of modern applications. After providing a historical perspective, we present a unified problem formulation and review the well-established modular solution scheme, which decouples the problem into three stages that identify, select, and execute potential navigation actions. We then analyze alternative approaches, including belief-space planning and deep reinforcement learning techniques, and review related work on multi-robot coordination. The manuscript concludes with a discussion of new research directions, addressing reproducible research, active spatial perception, and practical applications, among other topics.
COACH: Cooperative Robot Teaching
Yu, Cunjun, Xu, Yiqing, Li, Linfeng, Hsu, David
Knowledge and skills can transfer from human teachers to human students. However, such direct transfer is often not scalable for physical tasks, as they require one-to-one interaction, and human teachers are not available in sufficient numbers. Machine learning enables robots to become experts and play the role of teachers to help in this situation. In this work, we formalize cooperative robot teaching as a Markov game, consisting of four key elements: the target task, the student model, the teacher model, and the interactive teaching-learning process. Under a moderate assumption, the Markov game reduces to a partially observable Markov decision process, with an efficient approximate solution. We illustrate our approach on two cooperative tasks, one in a simulated video game and one with a real robot.
Simplified Continuous High Dimensional Belief Space Planning with Adaptive Probabilistic Belief-dependent Constraints
Zhitnikov, Andrey, Indelman, Vadim
Online decision making under uncertainty in partially observable domains, also known as Belief Space Planning, is a fundamental problem in robotics and Artificial Intelligence. Due to an abundance of plausible future unravelings, calculating an optimal course of action inflicts an enormous computational burden on the agent. Moreover, in many scenarios, e.g., information gathering, it is required to introduce a belief-dependent constraint. Prompted by this demand, in this paper, we consider a recently introduced probabilistic belief-dependent constrained POMDP. We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint, before expanding a complete set of future observations samples and without any loss in accuracy. Moreover, using our proposed framework, we contribute an adaptive method to find a maximal feasible return (e.g., information gain) in terms of Value at Risk for the candidate action sequence with substantial acceleration. On top of that, we introduce an adaptive simplification technique for a probabilistically constrained setting. Such an approach provably returns an identical-quality solution while dramatically accelerating online decision making. Our universal framework applies to any belief-dependent constrained continuous POMDP with parametric beliefs, as well as nonparametric beliefs represented by particles. In the context of an information-theoretic constraint, our presented framework stochastically quantifies if a cumulative information gain along the planning horizon is sufficiently significant (e.g. for, information gathering, active SLAM). We apply our method to active SLAM, a highly challenging problem of high dimensional Belief Space Planning. Extensive realistic simulations corroborate the superiority of our proposed ideas.
Learning to Act Safely with Limited Exposure and Almost Sure Certainty
Castellano, Agustin, Min, Hancheng, Bazerque, Juan, Mallada, Enrique
This paper puts forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials. This is indeed possible, provided that one is willing to navigate trade-offs between optimality, level of exposure to unsafe events, and the maximum detection time of unsafe actions. We illustrate this concept in two complementary settings. We first focus on the canonical multi-armed bandit problem and study the intrinsic trade-offs of learning safety in the presence of uncertainty. Under mild assumptions on sufficient exploration, we provide an algorithm that provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. We then consider the problem of finding optimal policies for a Markov Decision Process (MDP) with almost sure constraints. We show that the action-value function satisfies a barrier-based decomposition which allows for the identification of feasible policies independently of the reward process. Using this decomposition, we develop a Barrier-learning algorithm, that identifies such unsafe state-action pairs in a finite expected number of steps. Our analysis further highlights a trade-off between the time lag for the underlying MDP necessary to detect unsafe actions, and the level of exposure to unsafe events. Simulations corroborate our theoretical findings, further illustrating the aforementioned trade-offs, and suggesting that safety constraints can speed up the learning process.
Conv-NILM-Net, a causal and multi-appliance model for energy source separation
C., Simo Alami, Decock, Jรฉrรฉmie, Kaddah, Rim, Read, Jesse
Non-Intrusive Load Monitoring (NILM) seeks to save energy by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems. However most used models are used for Load Identification rather than online Source Separation. Among source separation models, most use a single-task learning approach in which a neural network is trained exclusively for each appliance. This strategy is computationally expensive and ignores the fact that multiple appliances can be active simultaneously and dependencies between them. The rest of models are not causal, which is important for real-time application. Inspired by Convtas-Net, a model for speech separation, we propose Conv-NILM-net, a fully convolutional framework for end-to-end NILM. Conv-NILM-net is a causal model for multi appliance source separation. Our model is tested on two real datasets REDD and UK-DALE and clearly outperforms the state of the art while keeping a significantly smaller size than the competing models.
Machine Learning: Concepts and Applications
This course gives you a comprehensive introduction to both the theory and practice of machine learning. You will learn to use Python along with industry-standard libraries and tools, including Pandas, Scikit-learn, and Tensorflow, to ingest, explore, and prepare data for modeling and then train and evaluate models using a wide variety of techniques. Those techniques include linear regression with ordinary least squares, logistic regression, support vector machines, decision trees and ensembles, clustering, principal component analysis, hidden Markov models, and deep learning. A key feature of this course is that you not only learn how to apply these techniques, you also learn the conceptual basis underlying them so that you understand how they work, why you are doing what you are doing, and what your results mean. The course also features real-world datasets, drawn primarily from the realm of public policy.
MSDC: Exploiting Multi-State Power Consumption in Non-intrusive Load Monitoring based on A Dual-CNN Model
He, Jialing, Liu, Jiamou, Zhang, Zijian, Chen, Yang, Liu, Yiwei, Khoussainov, Bakh, Zhu, Liehuang
Non-intrusive load monitoring (NILM) aims to decompose aggregated electrical usage signal into appliance-specific power consumption and it amounts to a classical example of blind source separation tasks. Leveraging recent progress on deep learning techniques, we design a new neural NILM model Multi-State Dual CNN (MSDC). Different from previous models, MSDC explicitly extracts information about the appliance's multiple states and state transitions, which in turn regulates the prediction of signals for appliances. More specifically, we employ a dual-CNN architecture: one CNN for outputting state distributions and the other for predicting the power of each state. A new technique is invented that utilizes conditional random fields (CRF) to capture state transitions. Experiments on two real-world datasets REDD and UK-DALE demonstrate that our model significantly outperform state-of-the-art models while having good generalization capacity, achieving 6%-10% MAE gain and 33%-51% SAE gain to unseen appliances.
Online Planning of Uncertain MDPs under Temporal Tasks and Safe-Return Constraints
This paper addresses the online motion planning problem of mobile robots under complex high-level tasks. The robot motion is modeled as an uncertain Markov Decision Process (MDP) due to limited initial knowledge, while the task is specified as Linear Temporal Logic (LTL) formulas. The proposed framework enables the robot to explore and update the system model in a Bayesian way, while simultaneously optimizing the asymptotic costs of satisfying the complex temporal task. Theoretical guarantees are provided for the synthesized outgoing policy and safety policy. More importantly, instead of greedy exploration under the classic ergodicity assumption, a safe-return requirement is enforced such that the robot can always return to home states with a high probability. The overall methods are validated by numerical simulations.