Goto

Collaborating Authors

 Hellinckx, Peter


Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts

arXiv.org Artificial Intelligence

Current postprocessing techniques often require separate models for each lead time and disregard possible inter-ensemble relationships by either correcting each member separately or by employing distributional approaches. In this work, we tackle these shortcomings with an innovative, fast and accurate Transformer which postprocesses each ensemble member individually while allowing information exchange across variables, spatial dimensions and lead times by means of multi-headed self-attention. Weather foreacasts are postprocessed over 20 lead times simultaneously while including up to twelve meteorological predictors. We use the EUPPBench dataset for training which contains ensemble predictions from the European Center for Medium-range Weather Forecasts' integrated forecasting system alongside corresponding observations. The work presented here is the first to postprocess the ten and one hundred-meter wind speed forecasts within this benchmark dataset, while also correcting the two-meter temperature. Our approach significantly improves the original forecasts, as measured by the CRPS, with 17.5 % for two-meter temperature, nearly 5% for ten-meter wind speed and 5.3 % for one hundred-meter wind speed, outperforming a classical member-by-member approach employed as competitive benchmark. Furthermore, being up to 75 times faster, it fulfills the demand for rapid operational weather forecasts in various downstream applications, including renewable energy forecasting.


A Review of the Deep Sea Treasure problem as a Multi-Objective Reinforcement Learning Benchmark

arXiv.org Artificial Intelligence

In this paper, the authors investigate the Deep Sea Treasure (DST) problem as proposed by Vamplew et al. Through a number of proofs, the authors show the original DST problem to be quite basic, and not always representative of practical Multi-Objective Optimization problems. In an attempt to bring theory closer to practice, the authors propose an alternative, improved version of the DST problem, and prove that some of the properties that simplify the original DST problem no longer hold. The authors also provide a reference implementation and perform a comparison between their implementation, and other existing open-source implementations of the problem. Finally, the authors also provide a complete Pareto-front for their new DST problem.


Autonomous Port Navigation With Ranging Sensors Using Model-Based Reinforcement Learning

arXiv.org Artificial Intelligence

Autonomous shipping has recently gained much interest in the research community. However, little research focuses on inland - and port navigation, even though this is identified by countries such as Belgium and the Netherlands as an essential step towards a sustainable future. These environments pose unique challenges, since they can contain dynamic obstacles that do not broadcast their location, such as small vessels, kayaks or buoys. Therefore, this research proposes a navigational algorithm which can navigate an inland vessel in a wide variety of complex port scenarios using ranging sensors to observe the environment. The proposed methodology is based on a machine learning approach that has recently set benchmark results in various domains: model-based reinforcement learning. By randomizing the port environments during training, the trained model can navigate in scenarios that it never encountered during training. Furthermore, results show that our approach outperforms the commonly used dynamic window approach and a benchmark model-free reinforcement learning algorithm. This work is therefore a significant step towards vessels that can navigate autonomously in complex port scenarios.


Safety Aware Autonomous Path Planning Using Model Predictive Reinforcement Learning for Inland Waterways

arXiv.org Artificial Intelligence

In recent years, interest in autonomous shipping in urban waterways has increased significantly due to the trend of keeping cars and trucks out of city centers. Classical approaches such as Frenet frame based planning and potential field navigation often require tuning of many configuration parameters and sometimes even require a different configuration depending on the situation. In this paper, we propose a novel path planning approach based on reinforcement learning called Model Predictive Reinforcement Learning (MPRL). MPRL calculates a series of waypoints for the vessel to follow. The environment is represented as an occupancy grid map, allowing us to deal with any shape of waterway and any number and shape of obstacles. We demonstrate our approach on two scenarios and compare the resulting path with path planning using a Frenet frame and path planning based on a proximal policy optimization (PPO) agent. Our results show that MPRL outperforms both baselines in both test scenarios. The PPO based approach was not able to reach the goal in either scenario while the Frenet frame approach failed in the scenario consisting of a corner with obstacles. MPRL was able to safely (collision free) navigate to the goal in both of the test scenarios.


An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.


Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder.


Exploiting non-i.i.d. data towards more robust machine learning algorithms

arXiv.org Artificial Intelligence

In the field of machine learning there is a growing interest towards more robust and generalizable algorithms. This is for example important to bridge the gap between the environment in which the training data was collected and the environment where the algorithm is deployed. Machine learning algorithms have increasingly been shown to excel in finding patterns and correlations from data. Determining the consistency of these patterns and for example the distinction between causal correlations and nonsensical spurious relations has proven to be much more difficult. In this paper a regularization scheme is introduced that prefers universal causal correlations. This approach is based on 1) the robustness of causal correlations and 2) the data not being independently and identically distribute (i.i.d.). The scheme is demonstrated with a classification task by clustering the (non-i.i.d.) training set in subpopulations.


Learning to Communicate Using Counterfactual Reasoning

arXiv.org Machine Learning

This paper introduces a new approach for multi-agent communication learning called multi-agent counterfactual communication (MACC) learning. Many real-world problems are currently tackled using multi-agent techniques. However, in many of these tasks the agents do not observe the full state of the environment but only a limited observation. This absence of knowledge about the full state makes completing the objectives significantly more complex or even impossible. The key to this problem lies in sharing observation information between agents or learning how to communicate the essential data. In this paper we present a novel multi-agent communication learning approach called MACC. It addresses the partial observability problem of the agents. MACC lets the agent learn the action policy and the communication policy simultaneously. We focus on decentralized Markov Decision Processes (Dec-MDP), where the agents have joint observability. This means that the full state of the environment can be determined using the observations of all agents. MACC uses counterfactual reasoning to train both the action and the communication policy. This allows the agents to anticipate on how other agents will react to certain messages and on how the environment will react to certain actions, allowing them to learn more effective policies. MACC uses actor-critic with a centralized critic and decentralized actors. The critic is used to calculate an advantage for both the action and communication policy. We demonstrate our method by applying it on the Simple Reference Particle environment of OpenAI and a MNIST game. Our results are compared with a communication and non-communication baseline. These experiments demonstrate that MACC is able to train agents for each of these problems with effective communication policies.