AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

#artificialintelligenceJan-8-2019, 02:53:34 GMT

Large-Scale Study of Curiosity-Driven Learning

Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments.

curiosity-driven learning, extrinsic reward, large-scale study, (3 more...)

Genre: Research Report > New Finding (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Grau-Moya, Jordi, Leibfried, Felix, Bou-Ammar, Haitham

Balancing Two-Player Stochastic Games with Soft Q-Learning

arXiv.org Artificial IntelligenceJan-8-2019

Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically. We contribute both theoretically and empirically. On the theory side, we show that games with soft Q-learning exhibit a unique value and generalise team games and zero-sum games far beyond these two extremes to cover a continuous spectrum of gaming behaviour. Experimentally, we show how tuning agents' constraints affect performance and demonstrate, through a neural network architecture, how to reliably balance games with high-dimensional representations.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1802.03216

Country: North America > United States > Virginia > Arlington County > Arlington (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Sedlmeier, Andreas, Gabor, Thomas, Phan, Thomy, Belzner, Lenz, Linnhoff-Popien, Claudia

Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning

arXiv.org Machine LearningJan-8-2019

We consider the problem of detecting out-of-distribution (OOD) samples in deep reinforcement learning. In a value based reinforcement learning setting, we propose to use uncertainty estimation techniques directly on the agent's value estimating neural network to detect OOD samples. The focus of our work lies in analyzing the suitability of approximate Bayesian inference methods and related ensembling techniques that generate uncertainty estimates. Although prior work has shown that dropout-based variational inference techniques and bootstrap-based approaches can be used to model epistemic uncertainty, the suitability for detecting OOD samples in deep reinforcement learning remains an open question. Our results show that uncertainty estimation can be used to differentiate in- from out-of-distribution samples. Over the complete training process of the reinforcement learning agents, bootstrap-based approaches tend to produce more reliable epistemic uncertainty estimates, when compared to dropout-based approaches.

arxiv e-print, neural network, training process, (9 more...)

1901.02219

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.07)

Genre: Research Report > New Finding (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)

Brown, Daniel S., Cui, Yuchen, Niekum, Scott

Risk-Aware Active Inverse Reinforcement Learning

arXiv.org Machine LearningJan-8-2019

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learning, we propose a risk-aware active inverse reinforcement learning algorithm that focuses active queries on areas of the state space with the potential for large generalization error. We show that risk-aware active learning outperforms standard active IRL approaches on gridworld, simulated driving, and table setting tasks, while also providing a performance-based stopping criterion that allows a robot to know when it has received enough demonstrations to safely perform a task.

demonstration, query, reward function, (13 more...)

1901.02161

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJan-7-2019, 06:26:47 GMT

Move 37 Explained

Why was AlphaGo's Move 37 against Lee Sedol so significant? Why was it so important that I named my 10 week course on deep reinforcement learning on it? In this final video of my course, I'll explain what move 37 symbolized for humanity and detail 3 examples of how it will affect healthcare, design, and decision-making. We'll go through a code example of a Generative Adversarial Network and even discuss China ambitious 2030 AI initiative. Theres a lot that I cover in this video, I hope that it helps connect the dots.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Country: Asia > China (0.28)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Weber, Théophane, Heess, Nicolas, Buesing, Lars, Silver, David

Credit Assignment Techniques in Stochastic Computation Graphs

arXiv.org Machine LearningJan-7-2019

Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

estimator, gradient, value function, (15 more...)

1901.01761

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.84)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
(3 more...)

Sun, Qingnan, Jankovic, Marko V., Budzinski, João, Moore, Brett, Diem, Peter, Stettler, Christoph, Mougiakakou, Stavroula G.

A dual mode adaptive basal-bolus advisor based on reinforcement learning

arXiv.org Artificial IntelligenceJan-7-2019

-- Self - monitoring of blood glucose (SMBG) and continuous glucose monitoring (CGM) are commonly used by type 1 diabetes (T1D) patients to measure glucose concentrations. The proposed adaptive basal - bolus algori thm (ABBA) supports inputs from either SMBG or CGM devices to provide personalised suggestions for the daily basal rate and prandial insulin doses on the basis of the patients' glucose level on the previous day. The ABBA is based on reinforcement learning (RL), a type of artificial intelligence, and was validated in silico with an FDA - accepted population of 100 adults under different realistic scenarios lasting three simulated months. The scenarios involve three main meals and one bedtime snack per day, alo ng with different variabilities and uncertainties for insulin sensitivity, mealtime, carbohydrate amount, and glucose measurement time. The results indicate that the proposed approach achieves comparable performance with CGM or SMBG as input signals, witho ut influencing the total daily insulin dose. The results are a promising indication that AI algorithmic approaches can provide personalised adaptive insulin optimisation and achieve glucose control - independently of the type of glucose monitoring technolo gy. Manuscript received August 30, 2018 This research was carried out within the framework of the MyTreat research and development project, supported by the Swiss Commi ssion of Technology and Innovation (CTI) under Grant 18172.1 PFLS - LS. Q.

abba, cir, variability, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JBHI.2018.2887067

1901.01816

Country:

Europe > Switzerland > Bern > Bern (0.05)
North America > United States > Texas > Bexar County > San Antonio (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJan-6-2019, 10:04:49 GMT

Using Deep Q-Learning to Control Optimization Hyperparameters

Which authors of this paper are endorsers? Disable MathJax (What is MathJax?)

artificial intelligence, machine learning, reinforcement learning, (4 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJan-5-2019

Deep Reinforcement Learning for Imbalanced Classification

Lin, Enlu, Chen, Qiong, Qi, Xiaoming

Abstract--Data in real-world application often exhibit skewed class distribution which poses an intense challenge for machine learning. Conventional classification algorithms are not effective in the case of imbalanced data distribution, and may fail when the data distribution is highly imbalanced. To address this issue, we propose a general imbalanced classification model based on deep reinforcement learning. We formulate the classification problem as a sequential decision-making process and solve it by deep Q-learning network. The agent performs a classification action on one sample at each time step, and the environment evaluates the classification action and returns a reward to the agent. The reward from minority class sample is larger so the agent is more sensitive to the minority class. The agent finally finds an optimal classification policy in imbalanced data under the guidance of specific reward function and beneficial learning environment. Experiments show that our proposed model outperforms the other imbalanced classification algorithms, and it can identify more minority samples and has great classification performance.

classification, imbalanced data, minority class, (17 more...)

1901.01379

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)