Goto

Collaborating Authors

 Energy


NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) aims at learning a good policy from a batch of collected data, without extra interactions with the environment during training. However, current offline RL benchmarks commonly have a large reality gap, because they involve large datasets collected by highly exploratory policies, and the trained policy is directly evaluated in the environment. In real-world situations, running a highly exploratory policy is prohibited to ensure system safety, the data is commonly very limited, and a trained policy should be well validated before deployment. In this paper, we present a near real-world offline RL benchmark, named NeoRL, which contains datasets from various domains with controlled sizes, and extra test datasets for policy validation. We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward. The empirical results demonstrate that the tested offline RL algorithms become less competitive to the deterministic policy on many datasets, and the offline policy evaluation hardly helps. The NeoRL suit can be found at http://polixir.ai/research/neorl. We hope this work will shed some light on future research and draw more attention when deploying RL in real-world systems.


State-Aware Variational Thompson Sampling for Deep Q-Networks

arXiv.org Artificial Intelligence

Thompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a high dimensional state-action space. We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution. We interpret the successful NoisyNets method \cite{fortunato2018noisy} as an approximation to the variational Thompson sampling method that we derive. Further, we propose State Aware Noisy Exploration (SANE) which seeks to improve on NoisyNets by allowing a non-uniform perturbation, where the amount of parameter perturbation is conditioned on the state of the agent. This is done with the help of an auxiliary perturbation module, whose output is state dependent and is learnt end to end with gradient descent. We hypothesize that such state-aware noisy exploration is particularly useful in problems where exploration in certain \textit{high risk} states may result in the agent failing badly. We demonstrate the effectiveness of the state-aware exploration method in the off-policy setting by augmenting DQNs with the auxiliary perturbation module.


Latent Map Gaussian Processes for Mixed Variable Metamodeling

arXiv.org Artificial Intelligence

Gaussian processes (GPs) are ubiquitously used in sciences and engineering as metamodels. Standard GPs, however, can only handle numerical or quantitative variables. In this paper, we introduce latent map Gaussian processes (LMGPs) that inherit the attractive properties of GPs but are also applicable to mixed data that have both quantitative and qualitative inputs. The core idea behind LMGPs is to learn a low-dimensional manifold where all qualitative inputs are represented by some quantitative features. To learn this manifold, we first assign a unique prior vector representation to each combination of qualitative inputs. We then use a linear map to project these priors on a manifold that characterizes the posterior representations. As the posteriors are quantitative, they can be straightforwardly used in any standard correlation function such as the Gaussian. Hence, the optimal map and the corresponding manifold can be efficiently learned by maximizing the Gaussian likelihood function. Through a wide range of analytical and real-world examples, we demonstrate the advantages of LMGPs over state-of-the-art methods in terms of accuracy and versatility. In particular, we show that LMGPs can handle variable-length inputs and provide insights into how qualitative inputs affect the response or interact with each other. We also provide a neural network interpretation of LMGPs and study the effect of prior latent representations on their performance.


Soft robots for ocean exploration and offshore operations: A perspective

Robohub

Most of the ocean is unknown. Yet we know that the most challenging environments on the planet reside in it. Understanding the ocean in its totality is a key component for the sustainable development of human activities and for the mitigation of climate change, as proclaimed by the United Nations. We are glad to share our perspective about the role of soft robots in ocean exploration and offshore operations at the outset of the ocean decade (2021-2030). In this study of the Soft Systems Group (part of The School of Engineering at The University of Edinburgh), we focus on the two ends of the water column: the abyss and the surface.


Council Post: We Need To Talk About An Energy Label For AI

#artificialintelligence

Artificial intelligence (AI) can distinguish a dog from a cat, but the billions of calculations needed to do so demand quite a lot of energy. The human brain can do the same thing while using only a small fraction of this energy. Could this phenomenon inspire us to develop more energy-efficient AI systems? Our computational power has risen exponentially, enabling the widespread use of artificial intelligence, a technology that relies on processing huge amounts of data to recognize patterns. When we use the recommendation algorithm of our favorite streaming service, we usually don't realize the gigantic energy consumption behind it.


Congratulations to the #AAAI2021 best paper winners

AIHub

The AAAI-21 best paper awards were announced on Thursday 4th February during the opening ceremony of AAAI 2021. There were three best papers, three best paper runners-up, and six distinguished papers. Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity.


Reinforcement Learning for Decision-Making and Control in Power Systems: Tutorial, Review, and Vision

arXiv.org Artificial Intelligence

With large-scale integration of renewable generation and ubiquitous distributed energy resources (DERs), modern power systems confront a series of new challenges in operation and control, such as growing complexity, increasing uncertainty, and aggravating volatility. While the upside is that more and more data are available owing to the widely-deployed smart meters, smart sensors, and upgraded communication networks. As a result, data-driven control techniques, especially reinforcement learning (RL), have attracted surging attention in recent years. In this paper, we focus on RL and aim to provide a tutorial on various RL techniques and how they can be applied to the decision-making and control in power systems. In particular, we select three key applications, including frequency regulation, voltage control, and energy management, for illustration, and present the typical ways to model and tackle them with RL methods. We conclude by emphasizing two critical issues in the application of RL, i.e., safety and scalability. Several potential future directions are discussed as well.


Uncertainty quantification and exploration-exploitation trade-off in humans

arXiv.org Artificial Intelligence

The main objective of this paper is to outline a theoretical framework to analyse how humans' decision-making strategies under uncertainty manage the trade-off between information gathering (exploration) and reward seeking (exploitation). A key observation, motivating this line of research, is the awareness that human learners are amazingly fast and effective at adapting to unfamiliar environments and incorporating upcoming knowledge: this is an intriguing behaviour for cognitive sciences as well as an important challenge for Machine Learning. The target problem considered is active learning in a black-box optimization task and more specifically how the exploration/exploitation dilemma can be modelled within Gaussian Process based Bayesian Optimization framework, which is in turn based on uncertainty quantification. The main contribution is to analyse humans' decisions with respect to Pareto rationality where the two objectives are improvement expected and uncertainty quantification. According to this Pareto rationality model, if a decision set contains a Pareto efficient (dominant) strategy, a rational decision maker should always select the dominant strategy over its dominated alternatives. The distance from the Pareto frontier determines whether a choice is (Pareto) rational (i.e., lays on the frontier) or is associated to "exasperate" exploration. However, since the uncertainty is one of the two objectives defining the Pareto frontier, we have investigated three different uncertainty quantification measures and selected the one resulting more compliant with the Pareto rationality model proposed. The key result is an analytical framework to characterize how deviations from "rationality" depend on uncertainty quantifications and the evolution of the reward seeking process.


Advanced Stationary and Non-Stationary Kernel Designs for Domain-Aware Gaussian Processes

arXiv.org Machine Learning

Gaussian process regression is a widely-applied method for function approximation and uncertainty quantification. The technique has gained popularity recently in the machine learning community due to its robustness and interpretability. The mathematical methods we discuss in this paper are an extension of the Gaussian-process framework. We are proposing advanced kernel designs that only allow for functions with certain desirable characteristics to be elements of the reproducing kernel Hilbert space (RKHS) that underlies all kernel methods and serves as the sample space for Gaussian process regression. These desirable characteristics reflect the underlying physics; two obvious examples are symmetry and periodicity constraints. In addition, non-stationary kernel designs can be defined in the same framework to yield flexible multi-task Gaussian processes. We will show the impact of advanced kernel designs on Gaussian processes using several synthetic and two scientific data sets. The results show that including domain knowledge, communicated through advanced kernel designs, has a significant impact on the accuracy and relevance of the function approximation. Gaussian processes (GPs) [14] provide a powerful mathematical framework for function approximation from data. The associated technique is generally referred to as Gaussian process regression (GPR). GPs are flexible, robust, non-parametric and naturally include uncertainty quantification.


Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots

arXiv.org Artificial Intelligence

Building domain-specific architectures for autonomous aerial robots is challenging due to a lack of systematic methodology for designing onboard compute. We introduce a novel performance model called the F-1 roofline to help architects understand how to build a balanced computing system for autonomous aerial robots considering both its cyber (sensor rate, compute performance) and physical components (body-dynamics) that affect the performance of the machine. We use F-1 to characterize commonly used learning-based autonomy algorithms with onboard platforms to demonstrate the need for cyber-physical co-design. To navigate the cyber-physical design space automatically, we subsequently introduce AutoPilot. This push-button framework automates the co-design of cyber-physical components for aerial robots from a high-level specification guided by the F-1 model. AutoPilot uses Bayesian optimization to automatically co-design the autonomy algorithm and hardware accelerator while considering various cyber-physical parameters to generate an optimal design under different task level complexities for different robots and sensor framerates. As a result, designs generated by AutoPilot, on average, lower mission time up to 2x over baseline approaches, conserving battery energy.