AITopics

Grislain, Nicolas, Perrin, Nicolas, Thabault, Antoine

Recurrent Neural Networks for Stochastic Control in Real-Time Bidding

arXiv.org Artificial IntelligenceJun-12-2020

Bidding in real-time auctions can be a difficult stochastic control task; especially if underdelivery incurs strong penalties and the market is very uncertain. Most current works and implementations focus on optimally delivering a campaign given a reasonable forecast of the market. Practical implementations have a feedback loop to adjust and be robust to forecasting errors, but no implementation, to the best of our knowledge, uses a model of market risk and actively anticipates market shifts. Solving such stochastic control problems in practice is actually very challenging. This paper proposes an approximate solution based on a Recurrent Neural Network (RNN) architecture that is both effective and practical for implementation in a production environment. The RNN bidder provisions everything it needs to avoid missing its goal. It also deliberately falls short of its goal when buying the missing impressions would cost more than the penalty for not reaching it.

deep learning, impression, neural network, (21 more...)

doi: 10.1145/3292500.3330749

2006.07042

Country: North America > United States > Alaska (0.16)

Genre: Research Report (0.50)

Industry:

Marketing (0.94)
Information Technology > Services (0.69)
Banking & Finance > Trading (0.68)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cowen-Rivers, Alexander I., Palenicek, Daniel, Moens, Vincent, Abdullah, Mohammed, Sootla, Aivar, Wang, Jun, Ammar, Haitham

SAMBA: Safe Model-Based & Active Reinforcement Learning

arXiv.org Artificial IntelligenceJun-12-2020

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

artificial intelligence, null, reinforcement learning, (16 more...)

2006.09436

Country:

Europe (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningJun-12-2020

Inductive Graph Neural Networks for Spatiotemporal Kriging

Wu, Yuankai, Zhuang, Dingyi, Labbe, Aurelie, Sun, Lijun

Time series forecasting and spatiotemporal kriging are the two most important tasks in spatiotemporal data analysis. Recent research on graph neural networks has made substantial progress in time series forecasting, while little attention is paid to the kriging problem---recovering signals for unsampled locations/sensors. Most existing scalable kriging methods (e.g., matrix/tensor completion) are transductive, and thus full retraining is required when we have a new sensor to interpolate. In this paper, we develop an Inductive Graph Neural Network Kriging (IGNNK) model to recover data for unsampled sensors on a network/graph structure. To generalize the effect of distance and reachability, we generate random subgraphs as samples and reconstruct the corresponding adjacency matrix for each sample. By reconstructing all signals on each sample subgraph, IGNNK can effectively learn the spatial message passing mechanism. Empirical results on several real-world spatiotemporal datasets demonstrate the effectiveness of our model. In addition, we also find that the learned model can be successfully transferred to the same type of kriging tasks on an unseen dataset. Our results show that: 1) GNN is an efficient and effective tool for spatial kriging; 2) inductive GNNs can be trained using dynamic adjacency matrices; and 3) a trained model can be transferred to new graph structures.

artificial intelligence, deep learning, machine learning, (17 more...)

2006.07527

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Alabama (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Renewable > Solar (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningJun-12-2020

TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search

Gogineni, Tarun, Xu, Ziping, Punzalan, Exequiel, Jiang, Runxuan, Kammeraad, Joshua, Tewari, Ambuj, Zimmerman, Paul

Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score. Our experimental results show that TorsionNet outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2006.07078

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zugarová, Eliška, Guy, Tatiana V.

Similarity-based transfer learning of decision policies

arXiv.org Artificial IntelligenceJun-12-2020

A problem of learning decision policy from past experience is considered. Using the Fully Probabilistic Design (FPD) formalism, we propose a new general approach for finding a stochastic policy from the past data.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.08768

Country:

Europe > Czechia > Prague (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Gothoskar, Nishad, Lázaro-Gredilla, Miguel, George, Dileep

From proprioception to long-horizon planning in novel environments: A hierarchical RL model

arXiv.org Artificial IntelligenceJun-11-2020

For an intelligent agent to flexibly and efficiently operate in complex environments, they must be able to reason at multiple levels of temporal, spatial, and conceptual abstraction. At the lower levels, the agent must interpret their proprioceptive inputs and control their muscles, and at the higher levels, the agent must select goals and plan how they will achieve those goals. It is clear that each of these types of reasoning is amenable to different types of representations, algorithms, and inputs. In this work, we introduce a simple, three-level hierarchical architecture that reflects these distinctions. The low-level controller operates on the continuous proprioceptive inputs, using model-free learning to acquire useful behaviors. These in turn induce a set of mid-level dynamics, which are learned by the mid-level controller and used for model-predictive control, to select a behavior to activate at each timestep. The high-level controller leverages a discrete, graph representation for goal selection and path planning to specify targets for the mid-level controller. We apply our method to a series of navigation tasks in the Mujoco Ant environment, consistently demonstrating significant improvements in sample-efficiency compared to prior model-free, model-based, and hierarchical RL methods. Finally, as an illustrative example of the advantages of our architecture, we apply our method to a complex maze environment that requires efficient exploration and long-horizon planning.

artificial intelligence, planning & scheduling, upstream oil & gas, (18 more...)

2006.0662

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Chen, Ningyuan, Wang, Chun, Wang, Longlin

Learning and Optimization with Seasonal Patterns

arXiv.org Machine LearningJun-11-2020

Online learning, or more specifically, the multi-armed bandit (MAB) problem, focuses on the task of learning the reward distributions from an unknown environment while simultaneously optimizing cumulative rewards over a fixed time horizon T. This problem has been studied extensively when the environment (i.e., reward distributions) is stationary over time, with numerous algorithms proposed to tackle the tradeoff between exploration and exploitation when making decisions (see Bubeck et al. 2012 for a comprehensive review). While the stationarity assumption about the reward distributions greatly simplifies the analysis, it does not hold in many decision problems in OR/MS and other fields when the environment is time-varying. For example, a fashion retailer should take into account the seasonal demand shift when setting the prices for apparels, and a hospital needs to consider the variation of the patient arrival rate when scheduling the medical staff. Despite the practical relevance, it is difficult to develop a learning policy for non-stationary rewards, especially when the dynamics can change arbitrarily over time. Recent studies (Besbes et al., 2015) have considered cases in which the environment does not change fast with respect to the length of the time horizon, e.g., when a budget sublinear in T is imposed on the total variation of the underlying reward distribution.

frequency, survey article, upstream oil & gas, (21 more...)

2005.08088

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China (0.14)

Genre:

Research Report (0.70)
Overview (0.66)

Industry:

Education (0.48)
Energy > Oil & Gas > Upstream (0.48)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Sakemi, Yusuke, Morino, Kai, Leleu, Timothée, Aihara, Kazuyuki

Model-Size Reduction for Reservoir Computing by Concatenating Internal States Through Time

arXiv.org Machine LearningJun-11-2020

Reservoir computing (RC) is a machine learning algorithm that can learn complex time series from data very rapidly based on the use of high-dimensional dynamical systems, such as random networks of neurons, called "reservoirs." To implement RC in edge computing, it is highly important to reduce the amount of computational resources that RC requires. In this study, we propose methods that reduce the size of the reservoir by inputting the past or drifting states of the reservoir to the output layer at the current time step. These proposed methods are analyzed based on information processing capacity, which is a performance measure of RC proposed by Dambre et al. (2012). In addition, we evaluate the effectiveness of the proposed methods on time-series prediction tasks: the generalized Henon-map and NARMA. On these tasks, we found that the proposed methods were able to reduce the size of the reservoir up to one tenth without a substantial increase in regression error. Because the applications of the proposed methods are not limited to a specific network structure of the reservoir, the proposed methods could further improve the energy efficiency of RC-based systems, such as FPGAs and photonic systems.

concatenation, delay-state concatenation, reservoir, (15 more...)

2006.06218

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry:

Energy (0.94)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJun-11-2020

Ultra-fast Deep Mixtures of Gaussian Process Experts

Etienam, Clement, Law, Kody, Wade, Sara

Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, and sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models. In the present article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). This combination provides a flexible, robust, and efficient model which is able to significantly outperform competing models. We furthermore consider efficient approaches to computing maximum a posteriori (MAP) estimators of these models by iteratively maximizing the distribution of experts given allocations and allocations given experts. We also show that a recently introduced method called Cluster-Classify- Regress (CCR) is capable of providing a good approximation of the optimal solution extremely quickly. This approximation can then be further refined with the iterative algorithm.

algorithm, artificial intelligence, machine learning, (15 more...)

2006.13309

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)