Tsourdos, Antonios
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments
Wisniewski, Mariusz, Chatzithanos, Paraskevas, Guo, Weisi, Tsourdos, Antonios
Deep Reinforcement learning (DRL) is used to enable autonomous navigation in unknown environments. Most research assume perfect sensor data, but real-world environments may contain natural and artificial sensor noise and denial. Here, we present a benchmark of both well-used and emerging DRL algorithms in a navigation task with configurable sensor denial effects. In particular, we are interested in comparing how different DRL methods (e.g. model-free PPO vs. model-based DreamerV3) are affected by sensor denial. We show that DreamerV3 outperforms other methods in the visual end-to-end navigation task with a dynamic goal - and other methods are not able to learn this. Furthermore, DreamerV3 generally outperforms other methods in sensor-denied environments. In order to improve robustness, we use adversarial training and demonstrate an improved performance in denied environments, although this generally comes with a performance cost on the vanilla environments. We anticipate this benchmark of different DRL methods and the usage of adversarial training to be a starting point for the development of more elaborate navigation strategies that are capable of dealing with uncertain and denied sensor readings.
Explainable Interface for Human-Autonomy Teaming: A Survey
Kong, Xiangqi, Xing, Yang, Tsourdos, Antonios, Wang, Ziyue, Guo, Weisi, Perrusquia, Adolfo, Wikander, Andreas
Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.
An Auction-based Coordination Strategy for Task-Constrained Multi-Agent Stochastic Planning with Submodular Rewards
Liu, Ruifan, Shin, Hyo-Sang, Yan, Binbin, Tsourdos, Antonios
Abstract--In many domains such as transportation and logistics, search and rescue, or cooperative surveillance, tasks are pending to be allocated with the consideration of possible execution uncertainties. Existing task coordination algorithms either ignore the stochastic process or suffer from the computational intensity. Taking advantage of the'weakly coupled' feature of the problem and the opportunity for coordination in advance, we propose a decentralized auction-based coordination strategy using a newly formulated score function which is generated by forming the problem into task-constrained Markov decision processes (MDPs). The proposed method guarantees convergence and at least 50% optimality in the premise of a submodular reward function. Furthermore, for the implementation on large-scale applications, an approximate variant of the proposed method, namely Deep Auction, is also suggested with the use of neural networks, which is evasive of the troublesome for constructing MDPs. Inspired by the well-known actor-critic architecture, two Transformers are used to map observations to action probabilities and cumulative rewards respectively. Finally, we demonstrate the performance of the two proposed approaches in the context of drone deliveries, where the stochastic planning for the drone league is cast into a stochastic price-collecting Vehicle Routing Problem (VRP) with time windows. Simulation results are compared with state-of-the-art methods in terms of solution quality, planning efficiency and scalability. Cooperative systems of multiple agents, which features a flexible structure, parallel-processing ability, and scalability, are of great interest, especially for those operating on the unmanned aerial platform [1], such as cooperative surveillance, search and rescue [2][3], border patrolling, etc. Among its various instantiations, there is a specific but widespread category of assigning tasks among team members with a global goal, followed by an independent and possibly stochastic task execution. For example, in deliveries of parcels [4], tasks are allocated to individual vehicles, and the vehicles deliver their allocated items in sequence, without interference from other executors. However, the delivery may subject to stochastic travel delays between destinations. Also, in multi-target tracking [5], agents decide the tracking targets and the best action to track based on the estimation of targe manoeuvring.
Dynamic deep-reinforcement-learning algorithm in Partially Observed Markov Decision Processes
Omi, Saki, Shin, Hyo-Sang, Cho, Namhoon, Tsourdos, Antonios
Reinforcement learning has been greatly improved in recent studies and an increased interest in real-world implementation has emerged in recent years. In many cases, due to the non-static disturbances, it becomes challenging for the agent to keep the performance. The disturbance results in the environment called Partially Observable Markov Decision Process. In common practice, Partially Observable Markov Decision Process is handled by introducing an additional estimator, or Recurrent Neural Network is utilized in the context of reinforcement learning. Both of the cases require to process sequential information on the trajectory. However, there are only a few studies investigating the effect of information to consider and the network structure to handle them. This study shows the benefit of action sequence inclusion in order to solve Partially Observable Markov Decision Process. Several structures and approaches are proposed to extend one of the latest deep reinforcement learning algorithms with LSTM networks. The developed algorithms showed enhanced robustness of controller performance against different types of external disturbances that are added to observation.
Computational Impact Time Guidance: A Learning-Based Prediction-Correction Approach
Liu, Zichao, Wang, Jiang, He, Shaoming, Shin, Hyo-Sang, Tsourdos, Antonios
This paper investigates the problem of impact-time-control and proposes a learning-based computational guidance algorithm to solve this problem. The proposed guidance algorithm is developed based on a general prediction-correction concept: the exact time-to-go under proportional navigation guidance with realistic aerodynamic characteristics is estimated by a deep neural network and a biased command to nullify the impact time error is developed by utilizing the emerging reinforcement learning techniques. The deep neural network is augmented into the reinforcement learning block to resolve the issue of sparse reward that has been observed in typical reinforcement learning formulation. Extensive numerical simulations are conducted to support the proposed algorithm.
Scalable Partial Explainability in Neural Networks via Flexible Activation Functions
Sun, Schyler C., Li, Chen, Wei, Zhuangkun, Tsourdos, Antonios, Guo, Weisi
Achieving transparency in black-box deep learning algorithms is still an open challenge. High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms. Current state-of-the-art NN interpretation methods (e.g. Saliency maps, DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself. In current deep NN operations, there is uncertainty over the exact role played by neurons with fixed activation functions. In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology. This is carried out by modeling the AFs as adaptive Gaussian Processes (GP), which sit within a novel scalable NN topology, based on the Kolmogorov-Arnold Superposition Theorem (KST). In this scalable NN architecture, the AFs are generated by GP interpolation between control points and can thus be tuned during the back-propagation procedure via gradient descent. The control points act as the core enabler to both local and global adjustability of AF, where the GP interpolation constrains the intrinsic autocorrelation to avoid over-fitting. We show that there exists a trade-off between the NN's expressive power and interpretation complexity, under linear KST topology scaling. To demonstrate this, we perform a case study on a binary classification dataset of banknote authentication. By quantitatively and qualitatively investigating the mapping relationship between inputs and output, our explainable model can provide interpretation over each of the one-dimensional attributes. These early results suggest that our model has the potential to act as the final interpretation layer for deep neural networks.
Computational Flight Control: A Domain-Knowledge-Aided Deep Reinforcement Learning Approach
Shin, Hyo-Sang, He, Shaoming, Tsourdos, Antonios
This papers aims to examine the potential of using the emerging deep reinforcement learning techniques in flight control. Instead of learning from scratch, the autopilot structure is fixed as typical three-loop autopilot and deep reinforcement learning is utilised to learn the autopilot gains. This domain-knowledge-aided approach is proved to significantly improve the learning efficiency. To solve the flight control problem, we then formulate a Markovian decision process with a proper reward function that enable the application of reinforcement learning theory. The state-of-the-art deep deterministic policy gradient algorithm is utilised to learn an action policy that maps the observed states to the autopilot gains. Extensive empirical numerical simulations are performed to validate the proposed computational control algorithm.
Anonymous Hedonic Game for Task Allocation in a Large-Scale Multiple Agent System
Jang, Inmo, Shin, Hyo-Sang, Tsourdos, Antonios
This paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents, and show that our proposed decentralized algorithm guarantees convergence of agents with social inhibition to a Nash stable partition (i.e., social agreement) within polynomial time. The algorithm is simple and executable based on local interactions with neighbor agents under a strongly-connected communication network and even in asynchronous environments. We analytically present a mathematical formulation for computing the lower bound of suboptimality of the solution, and additionally show that 50% of suboptimality can be at least guaranteed if social utilities are non-decreasing functions with respect to the number of co-working agents. The results of numerical experiments confirm that the proposed framework is scalable, fast adaptable against dynamical environments, and robust even in a realistic situation.