AITopics

Lu, Xiuyuan, Van Roy, Benjamin

Information-Theoretic Confidence Bounds for Reinforcement Learning

arXiv.org Artificial IntelligenceNov-21-2019

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and confidence bounds, we obtain results that relate the per-period performance of the agent with its information gain about the environment, thus explicitly characterizing the exploration-exploitation tradeoff. The resulting cumulative regret bound depends on the agent's uncertainty over the environment and quantifies the value of prior information. We show applicability of this approach to several environments, including linear bandits, tabular MDPs, and factored MDPs. These examples demonstrate the potential of a general information-theoretic approach for the design and analysis of reinforcement learning algorithms.

artificial intelligence, null, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

1911.09724

Country: North America (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Estimating uncertainty of earthquake rupture using Bayesian neural network

Ahamed, Sabber

Bayesian neural networks (BNN) are the probabilistic model that combines the strengths of both neural network (NN) and stochastic processes. As a result, BNN can combat overfitting and perform well in applications where data is limited. Earthquake rupture study is such a problem where data is insufficient, and scientists have to rely on many trial and error numerical or physical models. Lack of resources and computational expenses, often, it becomes hard to determine the reasons behind the earthquake rupture. In this work, a BNN has been used (1) to combat the small data problem and (2) to find out the parameter combinations responsible for earthquake rupture and (3) to estimate the uncertainty associated with earthquake rupture. Two thousand rupture simulations are used to train and test the model. A simple 2D rupture geometry is considered where the fault has a Gaussian geometric heterogeneity at the center, and eight parameters vary in each simulation. The test F1-score of BNN (0.8334), which is 2.34% higher than plain NN score. Results show that the parameters of rupture propagation have higher uncertainty than the rupture arrest. Normal stresses play a vital role in determining rupture propagation and are also the highest source of uncertainty, followed by the dynamic friction coefficient. Shear stress has a moderate role, whereas the geometric features such as the width and height of the fault are least significant and uncertain.

deep learning, earthquake rupture, upstream oil & gas, (21 more...)

1911.0966

Country: North America > Haiti (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.93)
Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Robust Learning-based Predictive Control for Constrained Nonlinear Systems

Zhang, Xinglong, Liu, Jiahang, Xu, Xin, Chen, Hong

The integration of machine learning methods and Model Predictive Control (MPC) has received increasing attention in recent years. In general, learning-based predictive control (LPC) is promising to build data-driven models and solve the online optimization problem with lower computational costs. However, the robustness of LPC is difficult to be guaranteed since there will be uncertainties due to function approximation used in machine learning algorithms. In this paper, a novel robust learning-based predictive control (r-LPC) scheme is proposed for constrained nonlinear systems with unknown dynamics. In r-LPC, the Koopman operator is used to form a global linear representation of the unknown dynamics, and an incremental actor-critic algorithm is presented for receding horizon optimization. To realize the satisfaction of system constraints, soft logarithmic barrier functions are designed within the learning predictive framework. The recursive feasibility and stability of the closed-loop system are discussed under the convergence arguments of the approximation algorithms adopted. Also, the robustness property of r-LPC is analyzed theoretically by taking into consideration the existence of perturbations on the controller due to possible approximation errors. Simulation results with the proposed learning control approach for the data-driven regulation of a Van der Pol oscillator system have been reported, including the comparisons with a classic MPC and an infinite-horizon Dual Heuristic Programming (DHP) algorithm. The results show that the r-LPC significantly outperforms the DHP algorithm in terms of control performance and can be comparative to the MPC in terms of regulating control as well as energy consumption. Moreover, its average computational cost is much smaller than that with the MPC in the adopted environment.

artificial intelligence, predictive control, upstream oil & gas, (16 more...)

1911.09827

Country: Asia > Taiwan (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Differentiable Algorithm for Marginalising Changepoints

Lim, Hyoungjin, Che, Gwonsoo, Lee, Wonyeol, Yang, Hongseok

We present an algorithm for marginalising changepoints in time-series models that assume a fixed number of unknown changepoints. Our algorithm is differentiable with respect to its inputs, which are the values of latent random variables other than changepoints. Also, it runs in time O(mn) where n is the number of time steps and m the number of changepoints, an improvement over a naive marginalisation method with O(n^m) time complexity. We derive the algorithm by identifying quantities related to this marginalisation problem, showing that these quantities satisfy recursive relationships, and transforming the relationships to an algorithm via dynamic programming. Since our algorithm is differentiable, it can be applied to convert a model non-differentiable due to changepoints to a differentiable one, so that the resulting models can be analysed using gradient-based inference or learning techniques. We empirically show the effectiveness of our algorithm in this application by tackling the posterior inference problem on synthetic and real-world data.

algorithm, neural network, upstream oil & gas, (20 more...)

1911.09839

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceNov-21-2019

Scalable methods for computing state similarity in deterministic Markov Decision Processes

Castro, Pablo Samuel

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.

bisimulation metric, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1911.09291

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Qian, Weizhu, Lauri, Fabrice, Gechter, Franck

A Probabilistic Approach for Discovering Daily Human Mobility Patterns with Mobile Data

--Discovering human mobility patterns with geo-location data collected from smartphone users has been a hot research topic in recent years. In this paper, we attempt to discover daily mobile patterns based on GPS data. We view this problem from a probabilistic perspective in order to explore more information from the original GPS data compared to other conventional methods. A non-parameter Bayesian modeling method, Infinite Gaussian Mixture Model, is used to estimate the probability density for the daily mobility. Then, we use Kullback-Leibler divergence as the metrics to measure the similarity of different probability distributions. And combining Infinite Gaussian Mixture Model and Kullback-Leibler divergence, we derived an automatic clustering algorithm to discover mobility patterns for each individual user without setting the number of clusters in advance. In the experiments, the effectiveness of our method is validated on the real user data collected from different users. The results show that the IGMM-based algorithm outperforms the GMM-based algorithm. We also test our methods on the dataset with different lengths to discover the minimum data length for discovering mobility patterns. I NTRODUCTION S MARTPHONEdevices are equipped with multiple sensors that can record user behavior on the handsets. With the help of a large-scale smartphone usage data, researchers are able to study human behavior in the real world.

data mining, machine learning, trajectory, (18 more...)

1911.09355

Country:

Europe > France (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Lima, Guilherme, Costa, Rodrigo, Moreno, Marcio Ferreira

An Introduction to Artificial Intelligence Applied to Multimedia

arXiv.org Artificial IntelligenceNov-21-2019

In this chapter, we give an introduction to symbolic artificial intelligence (AI) and discuss its relation and application to multimedia. We begin by defining what symbolic AI is, what distinguishes it from non-symbolic approaches, such as machine learning, and how it can used in the construction of advanced multimedia applications. We then introduce description logic (DL) and use it to discuss symbolic representation and reasoning. DL is the logical underpinning of OWL, the most successful family of ontology languages. After discussing DL, we present OWL and related Semantic Web technologies, such as RDF and SPARQL. We conclude the chapter by discussing a hybrid model for multimedia representation, called Hyperknowledge. Throughout the text, we make references to technologies and extensions specifically designed to solve the kinds of problems that arise in multimedia representation.

artificial intelligence, owl, semantic web, (18 more...)

arXiv.org Artificial Intelligence

1911.09606

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Ireland > Connaught > County Galway > Galway (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.95)
Media > Music (0.67)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Jun, Kwang-Sung, Orabona, Francesco

Parameter-Free Locally Differentially Private Stochastic Subgradient Descent

We consider the problem of minimizing a convex risk with stochastic subgradients guaranteeing null -locally differentially private ( null -LDP). While it has been shown that stochastic optimization is possible with null -LDP via the standard SGD (Song et al., 2013), its convergence rate largely depends on the learning rate, which must be tuned via repeated runs. Further, tuning is detrimental to privacy loss since it significantly increases the number of gradient requests. In this work, we propose BANCO (Betting Algorithm for Noisy COins), the first null -LDP SGD algorithm that essentially matches the convergence rate of the tuned SGD without any learning rate parameter, reducing privacy loss and saving privacy budget. 1 Introduction In this paper, we consider the problem of minimizing the convex risk of a machine learning predictor, guaranteeing local differential privacy. Instead of going through the empirical risk minimization route, we directly optimize the stochastic objective of the risk via stochastic subgradients appropriately sanitized to guarantee the local differential privacy.

algorithm, artificial intelligence, machine learning, (14 more...)

1911.09564

Country:

North America > United States > Arizona (0.05)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Sarrico, Marta, Arulkumaran, Kai, Agostinelli, Andrea, Richemond, Pierre, Bharath, Anil Anthony

Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control

Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way to further improve the sample efficiency of these approaches is to use more principled exploration strategies. In this work, we therefore propose maximum entropy mellowmax episodic control (MEMEC), which samples actions according to a Boltzmann policy with a state-dependent temperature. We demonstrate that MEMEC outperforms other uncertainty- and softmax-based exploration methods on classic reinforcement learning environments and Atari games, achieving both more rapid learning and higher final rewards.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1911.09615

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (0.91)
Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.62)