Overview
COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle using Deep Reinforcement Learning
Meyer, Eivind, Heiberg, Amalie, Rasheed, Adil, San, Omer
Path Following and Collision Avoidance, be it for unmanned surface vessels or other autonomous vehicles, are two fundamental guidance problems in robotics. For many decades, they have been subject to academic study, leading to a vast number of proposed approaches. However, they have mostly been treated as separate problems, and have typically relied on non-linear first-principles models with parameters that can only be determined experimentally. The rise of Deep Reinforcement Learning (DRL) in recent years suggests an alternative approach: end-to-end learning of the optimal guidance policy from scratch by means of a trial-and-error based approach. In this article, we explore the potential of Proximal Policy Optimization (PPO), a DRL algorithm with demonstrated state-of-the-art performance on Continuous Control tasks, when applied to the dual-objective problem of controlling an underactuated Autonomous Surface Vehicle in a COLREGs compliant manner such that it follows an a priori known desired path while avoiding collisions with other vessels along the way. Based on high-fidelity elevation and AIS tracking data from the Trondheim Fjord, an inlet of the Norwegian sea, we evaluate the trained agent's performance in challenging, dynamic real-world scenarios where the ultimate success of the agent rests upon its ability to navigate non-uniform marine terrain while handling challenging, but realistic vessel encounters.
A Survey of Constrained Gaussian Process Regression: Approaches and Implementation Challenges
Swiler, Laura, Gulian, Mamikon, Frankel, Ari, Safta, Cosmin, Jakeman, John
Gaussian process regression is a popular Bayesian framework for surrogate modeling of expensive data sources. As part of a broader effort in scientific machine learning, many recent works have incorporated physical constraints or other a priori information within Gaussian process regression to supplement limited data and regularize the behavior of the model. We provide an overview and survey of several classes of Gaussian process constraints, including positivity or bound constraints, monotonicity and convexity constraints, differential equation constraints provided by linear PDEs, and boundary condition constraints. We compare the strategies behind each approach as well as the differences in implementation, concluding with a discussion of the computational challenges introduced by constraints.
Evolution of Group-Theoretic Cryptology Attacks using Hyper-heuristics
Craven, Matthew J., Woodward, John R.
In previous work, we developed a single Evolutionary Algorithm (EA) to solve random instances of the Anshel-Anshel-Goldfeld (AAG) key exchange protocol over polycyclic groups. The EA consisted of six simple heuristics which manipulated strings. The present work extends this by exploring the use of hyper-heuristics in group-theoretic cryptology for the first time. Hyper-heuristics are a way to generate new algorithms from existing algorithm components (in this case the simple heuristics), with the EAs being one example of the type of algorithm which can be generated by our hyper-heuristic framework. We take as a starting point the above EA and allow hyper-heuristics to build on it by making small tweaks to it. This adaptation is through a process of taking the EA and injecting chains of heuristics built from the simple heuristics. We demonstrate we can create novel heuristic chains, which when placed in the EA create algorithms which out-perform the existing EA. The new algorithms solve a markedly greater number of random AAG instances than the EA for harder instances. This suggests the approach could be applied to many of the same kinds of problems, providing a framework for the solution of cryptology problems over groups. The contribution of this paper is thus a framework to automatically build algorithms to attack cryptology problems.
A systematic review and taxonomy of explanations in decision support and recommender systems
Nunes, Ingrid, Jannach, Dietmar
With the recent advances in the field of artificial intelligence, an increasing number of decision-making tasks are delegated to software systems. A key requirement for the success and adoption of such systems is that users must trust system choices or even fully automated decisions. To achieve this, explanation facilities have been widely investigated as a means of establishing trust in these systems since the early years of expert systems. With today's increasingly sophisticated machine learning algorithms, new challenges in the context of explanations, accountability, and trust towards such systems constantly arise. In this work, we systematically review the literature on explanations in advice-giving systems. This is a family of systems that includes recommender systems, which is one of the most successful classes of advice-giving software in practice. We investigate the purposes of explanations as well as how they are generated, presented to users, and evaluated. As a result, we derive a novel comprehensive taxonomy of aspects to be considered when designing explanation facilities for current and future decision support systems. The taxonomy includes a variety of different facets, such as explanation objective, responsiveness, content and presentation. Moreover, we identified several challenges that remain unaddressed so far, for example related to fine-grained issues associated with the presentation of explanations and how explanation facilities are evaluated.
A Survey of Machine Learning Methods and Challenges for Windows Malware Classification
Raff, Edward, Nicholas, Charles
Malware classification is a difficult problem, to which machine learning methods have been applied for decades. Yet progress has often been slow, in part due to a number of unique difficulties with the task that occur through all stages of the developing a machine learning system: data collection, labeling, feature creation and selection, model selection, and evaluation. In this survey we will review a number of the current methods and challenges related to malware classification, including data collection, feature extraction, and model construction, and evaluation. Our discussion will include thoughts on the constraints that must be considered for machine learning based solutions in this domain, and yet to be tackled problems for which machine learning could also provide a solution. This survey aims to be useful both to cybersecurity practitioners who wish to learn more about how machine learning can be applied to the malware problem, and to give data scientists the necessary background into the challenges in this uniquely complicated space.
CNN Acceleration by Low-rank Approximation with Quantized Factors
Kozyrskiy, Nikolay, Phan, Anh-Huy
The modern convolutional neural networks although achieve great results in solving complex computer vision tasks still cannot be effectively used in mobile and embedded devices due to the strict requirements for computational complexity, memory and power consumption. The CNNs have to be compressed and accelerated before deployment. In order to solve this problem the novel approach combining two known methods, low-rank tensor approximation in Tucker format and quantization of weights and feature maps (activations), is proposed. The greedy one-step and multi-step algorithms for the task of multilinear rank selection are proposed. The approach for quality restoration after applying Tucker decomposition and quantization is developed. The efficiency of our method is demonstrated for ResNet18 and ResNet34 on CIFAR-10, CIFAR-100 and Imagenet classification tasks. As a result of comparative analysis performed for other methods for compression and acceleration our approach showed its promising features.
Chest X-rays Pneumonia Detection using Convolutional Neural Network
Convolutional Neural Network (CNN) might seem intimidating for a beginner. However, this project will provide an overview of how to build a model from scratch to detect pneumonia using Tensorflow and Keras. Pneumonia is a lung inflammation caused by a viral or bacterial infection that can range from mild to severe cases. This inflammation makes the patient unable to breathe enough oxygen to reach the bloodstream. It happens when an infection makes the air sacs (alveoli) in the lungs fill with fluid or pus that might affect either one or both lungs.
Reinforcement Learning
Buffet, Olivier, Pietquin, Olivier, Weng, Paul
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e.g., board games, video games or autonomous vehicles. In such problems, an agent faces a sequential decision-making problem where, at every time step, it observes its state, performs an action, receives a reward and moves to a new state. An RL agent learns by trial and error a good policy (or controller) based on observations and numeric reward feedback on the previously performed action. In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy. The first one, which is value-based, consists in estimating the value of an optimal policy, value from which a policy can be recovered, while the other, called policy search, directly works in a policy space. Actor-critic methods can be seen as a policy search technique where the policy value that is learned guides the policy improvement. Besides, we give an overview of some extensions of the standard RL framework, notably when risk-averse behavior needs to be taken into account or when rewards are not available or not known.
XAI for Graphs: Explaining Graph Neural Network Predictions by Identifying Relevant Walks
Schnake, Thomas, Eberle, Oliver, Lederer, Jonas, Nakajima, Shinichi, Schütt, Kristof T., Müller, Klaus-Robert, Montavon, Grégoire
Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI (XAI) approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we contribute by proposing a new XAI approach for GNNs. Our approach is derived from high-order Taylor expansions and is able to generate a decomposition of the GNN prediction as a collection of relevant walks on the input graph. We find that these high-order Taylor expansions can be equivalently (and more simply) computed using multiple backpropagation passes from the top layer of the GNN to the first layer. The explanation can then be further robustified and generalized by using layer-wise-relevance propagation (LRP) in place of the standard equations for gradient propagation. Our novel method which we denote as 'GNN-LRP' is tested on scale-free graphs, sentence parsing trees, molecular graphs, and pixel lattices representing images. In each case, it performs stably and accurately, and delivers interesting and novel application insights.
Learning and Optimization with Seasonal Patterns
Chen, Ningyuan, Wang, Chun, Wang, Longlin
Online learning, or more specifically, the multi-armed bandit (MAB) problem, focuses on the task of learning the reward distributions from an unknown environment while simultaneously optimizing cumulative rewards over a fixed time horizon T. This problem has been studied extensively when the environment (i.e., reward distributions) is stationary over time, with numerous algorithms proposed to tackle the tradeoff between exploration and exploitation when making decisions (see Bubeck et al. 2012 for a comprehensive review). While the stationarity assumption about the reward distributions greatly simplifies the analysis, it does not hold in many decision problems in OR/MS and other fields when the environment is time-varying. For example, a fashion retailer should take into account the seasonal demand shift when setting the prices for apparels, and a hospital needs to consider the variation of the patient arrival rate when scheduling the medical staff. Despite the practical relevance, it is difficult to develop a learning policy for non-stationary rewards, especially when the dynamics can change arbitrarily over time. Recent studies (Besbes et al., 2015) have considered cases in which the environment does not change fast with respect to the length of the time horizon, e.g., when a budget sublinear in T is imposed on the total variation of the underlying reward distribution.