Energy
Seismic waves reveal giant structures deep beneath Earth's surface
Seismic wave data has revealed giant structures 2900 kilometres beneath the surface of Earth, at the boundary between Earth's molten core and solid mantle. The structure, known as an ultra-low velocity (ULV) zone, is about 1000 kilometres in diameter and 25 kilometres thick, says Kim. These structures are called ULV zones because seismic waves pass through them at slower velocities, but what they are made of is still a mystery. They might be chemically distinct from Earth's iron–nickel alloy core and silicate rock mantle, or have different thermal properties. The researchers discovered the structure while analysing 7000 records of seismic activity from earthquakes that occurred around the Pacific Ocean basin between 1990 and 2018.
Recurrent Neural Networks for Stochastic Control in Real-Time Bidding
Grislain, Nicolas, Perrin, Nicolas, Thabault, Antoine
Bidding in real-time auctions can be a difficult stochastic control task; especially if underdelivery incurs strong penalties and the market is very uncertain. Most current works and implementations focus on optimally delivering a campaign given a reasonable forecast of the market. Practical implementations have a feedback loop to adjust and be robust to forecasting errors, but no implementation, to the best of our knowledge, uses a model of market risk and actively anticipates market shifts. Solving such stochastic control problems in practice is actually very challenging. This paper proposes an approximate solution based on a Recurrent Neural Network (RNN) architecture that is both effective and practical for implementation in a production environment. The RNN bidder provisions everything it needs to avoid missing its goal. It also deliberately falls short of its goal when buying the missing impressions would cost more than the penalty for not reaching it.
SAMBA: Safe Model-Based & Active Reinforcement Learning
Cowen-Rivers, Alexander I., Palenicek, Daniel, Moens, Vincent, Abdullah, Mohammed, Sootla, Aivar, Wang, Jun, Ammar, Haitham
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
Inductive Graph Neural Networks for Spatiotemporal Kriging
Wu, Yuankai, Zhuang, Dingyi, Labbe, Aurelie, Sun, Lijun
Time series forecasting and spatiotemporal kriging are the two most important tasks in spatiotemporal data analysis. Recent research on graph neural networks has made substantial progress in time series forecasting, while little attention is paid to the kriging problem---recovering signals for unsampled locations/sensors. Most existing scalable kriging methods (e.g., matrix/tensor completion) are transductive, and thus full retraining is required when we have a new sensor to interpolate. In this paper, we develop an Inductive Graph Neural Network Kriging (IGNNK) model to recover data for unsampled sensors on a network/graph structure. To generalize the effect of distance and reachability, we generate random subgraphs as samples and reconstruct the corresponding adjacency matrix for each sample. By reconstructing all signals on each sample subgraph, IGNNK can effectively learn the spatial message passing mechanism. Empirical results on several real-world spatiotemporal datasets demonstrate the effectiveness of our model. In addition, we also find that the learned model can be successfully transferred to the same type of kriging tasks on an unseen dataset. Our results show that: 1) GNN is an efficient and effective tool for spatial kriging; 2) inductive GNNs can be trained using dynamic adjacency matrices; and 3) a trained model can be transferred to new graph structures.
TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
Gogineni, Tarun, Xu, Ziping, Punzalan, Exequiel, Jiang, Runxuan, Kammeraad, Joshua, Tewari, Ambuj, Zimmerman, Paul
Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score. Our experimental results show that TorsionNet outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy.
From proprioception to long-horizon planning in novel environments: A hierarchical RL model
Gothoskar, Nishad, Lázaro-Gredilla, Miguel, George, Dileep
For an intelligent agent to flexibly and efficiently operate in complex environments, they must be able to reason at multiple levels of temporal, spatial, and conceptual abstraction. At the lower levels, the agent must interpret their proprioceptive inputs and control their muscles, and at the higher levels, the agent must select goals and plan how they will achieve those goals. It is clear that each of these types of reasoning is amenable to different types of representations, algorithms, and inputs. In this work, we introduce a simple, three-level hierarchical architecture that reflects these distinctions. The low-level controller operates on the continuous proprioceptive inputs, using model-free learning to acquire useful behaviors. These in turn induce a set of mid-level dynamics, which are learned by the mid-level controller and used for model-predictive control, to select a behavior to activate at each timestep. The high-level controller leverages a discrete, graph representation for goal selection and path planning to specify targets for the mid-level controller. We apply our method to a series of navigation tasks in the Mujoco Ant environment, consistently demonstrating significant improvements in sample-efficiency compared to prior model-free, model-based, and hierarchical RL methods. Finally, as an illustrative example of the advantages of our architecture, we apply our method to a complex maze environment that requires efficient exploration and long-horizon planning.
Learning and Optimization with Seasonal Patterns
Chen, Ningyuan, Wang, Chun, Wang, Longlin
Online learning, or more specifically, the multi-armed bandit (MAB) problem, focuses on the task of learning the reward distributions from an unknown environment while simultaneously optimizing cumulative rewards over a fixed time horizon T. This problem has been studied extensively when the environment (i.e., reward distributions) is stationary over time, with numerous algorithms proposed to tackle the tradeoff between exploration and exploitation when making decisions (see Bubeck et al. 2012 for a comprehensive review). While the stationarity assumption about the reward distributions greatly simplifies the analysis, it does not hold in many decision problems in OR/MS and other fields when the environment is time-varying. For example, a fashion retailer should take into account the seasonal demand shift when setting the prices for apparels, and a hospital needs to consider the variation of the patient arrival rate when scheduling the medical staff. Despite the practical relevance, it is difficult to develop a learning policy for non-stationary rewards, especially when the dynamics can change arbitrarily over time. Recent studies (Besbes et al., 2015) have considered cases in which the environment does not change fast with respect to the length of the time horizon, e.g., when a budget sublinear in T is imposed on the total variation of the underlying reward distribution.
Model-Size Reduction for Reservoir Computing by Concatenating Internal States Through Time
Sakemi, Yusuke, Morino, Kai, Leleu, Timothée, Aihara, Kazuyuki
Reservoir computing (RC) is a machine learning algorithm that can learn complex time series from data very rapidly based on the use of high-dimensional dynamical systems, such as random networks of neurons, called "reservoirs." To implement RC in edge computing, it is highly important to reduce the amount of computational resources that RC requires. In this study, we propose methods that reduce the size of the reservoir by inputting the past or drifting states of the reservoir to the output layer at the current time step. These proposed methods are analyzed based on information processing capacity, which is a performance measure of RC proposed by Dambre et al. (2012). In addition, we evaluate the effectiveness of the proposed methods on time-series prediction tasks: the generalized Henon-map and NARMA. On these tasks, we found that the proposed methods were able to reduce the size of the reservoir up to one tenth without a substantial increase in regression error. Because the applications of the proposed methods are not limited to a specific network structure of the reservoir, the proposed methods could further improve the energy efficiency of RC-based systems, such as FPGAs and photonic systems.
Ultra-fast Deep Mixtures of Gaussian Process Experts
Etienam, Clement, Law, Kody, Wade, Sara
Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, and sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models. In the present article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). This combination provides a flexible, robust, and efficient model which is able to significantly outperform competing models. We furthermore consider efficient approaches to computing maximum a posteriori (MAP) estimators of these models by iteratively maximizing the distribution of experts given allocations and allocations given experts. We also show that a recently introduced method called Cluster-Classify- Regress (CCR) is capable of providing a good approximation of the optimal solution extremely quickly. This approximation can then be further refined with the iterative algorithm.