Goto

Collaborating Authors

 Markov Models


Cooperative Trajectory Planning in Uncertain Environments with Monte Carlo Tree Search and Risk Metrics

arXiv.org Artificial Intelligence

Automated vehicles require the ability to cooperate with humans for smooth integration into today's traffic. While the concept of cooperation is well known, developing a robust and efficient cooperative trajectory planning method is still a challenge. One aspect of this challenge is the uncertainty surrounding the state of the environment due to limited sensor accuracy. This uncertainty can be represented by a Partially Observable Markov Decision Process. Our work addresses this problem by extending an existing cooperative trajectory planning approach based on Monte Carlo Tree Search for continuous action spaces. It does so by explicitly modeling uncertainties in the form of a root belief state, from which start states for trees are sampled. After the trees have been constructed with Monte Carlo Tree Search, their results are aggregated into return distributions using kernel regression. We apply two risk metrics for the final selection, namely a Lower Confidence Bound and a Conditional Value at Risk. It can be demonstrated that the integration of risk metrics in the final selection policy consistently outperforms a baseline in uncertain environments, generating considerably safer trajectories.


A Generative Approach for Production-Aware Industrial Network Traffic Modeling

arXiv.org Artificial Intelligence

The new wave of digitization induced by Industry 4.0 calls for ubiquitous and reliable connectivity to perform and automate industrial operations. 5G networks can afford the extreme requirements of heterogeneous vertical applications, but the lack of real data and realistic traffic statistics poses many challenges for the optimization and configuration of the network for industrial environments. In this paper, we investigate the network traffic data generated from a laser cutting machine deployed in a Trumpf factory in Germany. We analyze the traffic statistics, capture the dependencies between the internal states of the machine, and model the network traffic as a production state dependent stochastic process. The two-step model is proposed as follows: first, we model the production process as a multi-state semi-Markov process, then we learn the conditional distributions of the production state dependent packet interarrival time and packet size with generative models. We compare the performance of various generative models including variational autoencoder (VAE), conditional variational autoencoder (CVAE), and generative adversarial network (GAN). The numerical results show a good approximation of the traffic arrival statistics depending on the production state. Among all generative models, CVAE provides in general the best performance in terms of the smallest Kullback-Leibler divergence.


Mixed Observable RRT: Multi-Agent Mission-Planning in Partially Observable Environments

arXiv.org Artificial Intelligence

This paper considers centralized mission-planning for a heterogeneous multi-agent system with the aim of locating a hidden target. We propose a mixed observable setting, consisting of a fully observable state-space and a partially observable environment, using a hidden Markov model. First, we construct rapidly exploring random trees (RRTs) to introduce the mixed observable RRT for finding plausible mission plans giving way-points for each agent. Leveraging this construction, we present a path-selection strategy based on a dynamic programming approach, which accounts for the uncertainty from partial observations and minimizes the expected cost. Finally, we combine the high-level plan with model predictive control algorithms to evaluate the approach on an experimental setup consisting of a quadruped robot and a drone. It is shown that agents are able to make intelligent decisions to explore the area efficiently and to locate the target through collaborative actions.



Risk Verification of Stochastic Systems with Neural Network Controllers

arXiv.org Artificial Intelligence

Motivated by the fragility of neural network (NN) controllers in safety-critical applications, we present a data-driven framework for verifying the risk of stochastic dynamical systems with NN controllers. Given a stochastic control system, an NN controller, and a specification equipped with a notion of trace robustness (e.g., constraint functions or signal temporal logic), we collect trajectories from the system that may or may not satisfy the specification. In particular, each of the trajectories produces a robustness value that indicates how well (severely) the specification is satisfied (violated). We then compute risk metrics over these robustness values to estimate the risk that the NN controller will not satisfy the specification. We are further interested in quantifying the difference in risk between two systems, and we show how the risk estimated from a nominal system can provide an upper bound the risk of a perturbed version of the system. In particular, the tightness of this bound depends on the closeness of the systems in terms of the closeness of their system trajectories. For Lipschitz continuous and incrementally input-to-state stable systems, we show how to exactly quantify system closeness with varying degrees of conservatism, while we estimate system closeness for more general systems from data in our experiments. We demonstrate our risk verification approach on two case studies, an underwater vehicle and an F1/10 autonomous car.


Switching Attention in Time-Varying Environments via Bayesian Inference of Abstractions

arXiv.org Artificial Intelligence

Motivated by the goal of endowing robots with a means for focusing attention in order to operate reliably in complex, uncertain, and time-varying environments, we consider how a robot can (i) determine which portions of its environment to pay attention to at any given point in time, (ii) infer changes in context (e.g., task or environment dynamics), and (iii) switch its attention accordingly. In this work, we tackle these questions by modeling context switches in a time-varying Markov decision process (MDP) framework. We utilize the theory of bisimulation-based state abstractions in order to synthesize mechanisms for paying attention to context-relevant information. We then present an algorithm based on Bayesian inference for detecting changes in the robot's context (task or environment dynamics) as it operates online, and use this to trigger switches between different abstraction-based attention mechanisms. Our approach is demonstrated on two examples: (i) an illustrative discrete-state tracking problem, and (ii) a continuous-state tracking problem implemented on a quadrupedal hardware platform. These examples demonstrate the ability of our approach to detect context switches online and robustly ignore task-irrelevant distractors by paying attention to context-relevant information.


Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side

arXiv.org Artificial Intelligence

As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.


Video Vision Transformers for Violence Detection

arXiv.org Artificial Intelligence

Law enforcement and city safety are significantly impacted by detecting violent incidents in surveillance systems. Although modern (smart) cameras are widely available and affordable, such technological solutions are impotent in most instances. Furthermore, personnel monitoring CCTV recordings frequently show a belated reaction, resulting in the potential cause of catastrophe to people and property. Thus automated detection of violence for swift actions is very crucial. The proposed solution uses a novel end-to-end deep learning-based video vision transformer (ViViT) that can proficiently discern fights, hostile movements, and violent events in video sequences. The study presents utilizing a data augmentation strategy to overcome the downside of weaker inductive biasness while training vision transformers on a smaller training datasets. The evaluated results can be subsequently sent to local concerned authority, and the captured video can be analyzed. In comparison to state-of-theart (SOTA) approaches the proposed method achieved auspicious performance on some of the challenging benchmark datasets.


Multi-vehicle Conflict Resolution in Highly Constrained Spaces by Merging Optimal Control and Reinforcement Learning

arXiv.org Artificial Intelligence

Abstract: We present a novel method to address the problem of multi-vehicle conflict resolution in highly constrained spaces. An optimal control problem is formulated to incorporate nonlinear, non-holonomic vehicle dynamics and exact collision avoidance constraints. A solution to the problem can be obtained by first learning configuration strategies with reinforcement learning (RL) in a simplified discrete environment, and then using these strategies to shape the constraint space of the original problem. Simulation results show that our method can explore efficient actions to resolve conflicts in confined space and generate dexterous maneuvers that are both collision-free and kinematically feasible. Keywords: Trajectory and Path Planning, Multi-vehicle systems, Autonomous Vehicles, Reinforcement learning control, Control problems under conflict 1. INTRODUCTION When conflicts arise in highly constrained spaces such as crowded parking lots, both the optimal control and the RL approaches often fail due to the following reasons: Current autonomous vehicles (AVs) operate reasonably well in environments where traffic rules are well-defined, (i) The vehicles need to plan for combinatorial actions in the surrounding agents are rational, and their actions can order to create spaces for each other to pass through; be easily predicted.


Leveraging Fully Observable Policies for Learning under Partial Observability

arXiv.org Artificial Intelligence

In contrast, the setting of fully observable (FO) control has featured the success of many powerful reinforcement learning (RL) algorithms (e.g., [8, 9, 10, 11]). Unfortunately, full observability only holds for a small portion of realistic robotics problems. Figure 1: To reach the In this work, we attempt to leverage good fully observable policies (state correct goal object, a experts) available only during offline training to help train PO policies state expert takes the that can execute online. We rely on the setting of offline training and red path directly, while online execution, a successful RL framework where an agent can use a partially observable "privileged" information such as the state [12, 13, 14, 15] or the belief agent must first take the about the state [6] during offline training, e.g., from simulators, to efficiently green path to identify learn PO policies that are later can be deployed without the access the correct goal object, to the privileged information anymore. In this work, the privileged information then take the red path. is not just the state itself but also the state expert. Our setting can be illustrated in a navigation task (Figure 1), which requires an agent to navigate to an unknown goal object on the right, identifiable by an object on the left side. While the optimal behavior under partial observability is to first navigate leftwards to identify the goal object, the state expert is able to move to the goal object directly. Despite being sup-optimal from the PO perspective, the state expert can provide experience during training leading to the goal object, which is potentially useful for both exploration and as a part of the policy needed in the PO case after the goal object is identified.