AITopics

1806.02426

Country: Europe > United Kingdom (0.28)

Genre:

Instructional Material (0.68)
Research Report (0.64)

Industry:

Leisure & Entertainment (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

arXiv.org Machine LearningJun-6-2018

Randomized Value Functions via Multiplicative Normalizing Flows

Touati, Ahmed, Satija, Harsh, Romoff, Joshua, Pineau, Joelle, Vincent, Pascal

Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling into local optima. In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) to achieve randomized value functions for high-dimensional domains. In particular, we augment DQN with multiplicative normalizing flows in order to track an approximate posterior distribution over its parameters. This allows the agent to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods. We demonstrate the benefits of our approach through an empirical comparison in high dimensional environments.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1806.02315

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

#artificialintelligenceJun-5-2018, 04:50:48 GMT

Top 5 Reinforcement Learning Books

Reinforcement Learning - over the last decade we have seen a lot of progress in use of reinforcement learning algorithms in settings when labeled data doesn't exist and a supverisde learning approach is not possible. The state of the art approach to tackling RL problems are Policy Gradients, which in combination with Monte Carlo Tree Search were employed by Google DeepMind's AlphaGo system to famously beat the Go world champion Lee Sedol. The readers will love our list because it is Data-Driven & Objective. Artificial Intelligence: A Modern Approach, 3e offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence. Dr. Peter Norvig, contributing Artificial Intelligence author and Professor Sebastian Thrun, a Pearson author are offering a free online course at Stanford University on artificial intelligence.

artificial intelligence, machine learning, reinforcement learning book, (4 more...)

#artificialintelligence

Genre:

Summary/Review (0.76)
Overview (0.56)

Industry:

Education > Educational Setting > Online (0.80)
Leisure & Entertainment > Games > Go (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Armstrong, Stuart, O'Rorke, Xavier

Good and safe uses of AI Oracles

arXiv.org Artificial IntelligenceJun-5-2018

It is possible that powerful and potentially dangerous artificial intelligence (AI) might be developed in the future (Russell et al., 2016) (Grace et al., 2017). An Oracle is a design which aims to restrain the impact of a potentially dangerous AI by restricting the agent to no actions besides answering questions (Babcock et al., 2016). Unfortunately, most Oracles will be motivated to gain more control over the world by manipulating users through the content of their answers, and Oracles of potentially high intelligence might be very successful at this (Alfonseca et al., 2016). In this paper we present two designs for Oracles which, even under pessimistic assumptions, will not manipulate their users into releasing them and yet will still be incentivised to provide their users with helpful answers. The first design is the counterfactual Oracle - which choses its answer as if it expected nobody to ever read it. The second design is the low-bandwidth Oracle - which is limited by the quantity of information it can transmit.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1711.05541

Country: North America > United States (0.28)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

arXiv.org Artificial IntelligenceJun-5-2018

Learning to Follow Language Instructions with Adversarial Reward Induction

Bahdanau, Dzmitry, Hill, Felix, Leike, Jan, Hughes, Edward, Kohli, Pushmeet, Grefenstette, Edward

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, for many real-world natural language commands that involve a degree of underspecification or ambiguity, such as "tidy the room", it would be challenging or impossible to program an appropriate reward function. To overcome this, we present a method for learning to follow commands from a training set of instructions and corresponding example goal-states, rather than an explicit reward function. Importantly, the example goal-states are not seen at test time. The approach effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, the method enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new training examples.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1806.01946

Country: North America (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Armstrong, Stuart, O'Rourke, Xavier

'Indifference' methods for managing agent rewards

arXiv.org Artificial IntelligenceJun-5-2018

'Indifference' refers to a class of methods used to control reward based agents. Indifference techniques aim to achieve one or more of three distinct goals: rewards dependent on certain events (without the agent being motivated to manipulate the probability of those events), effective disbelief (where agents behave as if particular events could never happen), and seamless transition from one reward function to another (with the agent acting as if this change is unanticipated). This paper presents several methods for achieving these goals in the POMDP setting, establishing their uses, strengths, and requirements. These methods of control work even when the implications of the agent's reward are otherwise not fully understood.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

1712.06365

Country: Europe (0.46)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

arXiv.org Machine LearningJun-5-2018

Relational Deep Reinforcement Learning

Zambaldi, Vinicius, Raposo, David, Santoro, Adam, Bapst, Victor, Li, Yujia, Babuschkin, Igor, Tuyls, Karl, Reichert, David, Lillicrap, Timothy, Lockhart, Edward, Shanahan, Murray, Langston, Victoria, Pascanu, Razvan, Botvinick, Matthew, Vinyals, Oriol, Battaglia, Peter

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1806.0183

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Czarnecki, Wojciech Marian, Jayakumar, Siddhant M., Jaderberg, Max, Hasenclever, Leonard, Teh, Yee Whye, Osindero, Simon, Heess, Nicolas, Pascanu, Razvan

Mix&Match - Agent Curricula for Reinforcement Learning

arXiv.org Machine LearningJun-5-2018

We introduce Mix & Match (M&M) - a training framework designed to facilitate rapid and effective learning in RL agents, especially those that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents. In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally. We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1806.0178

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Dietterich, Thomas G., Trimponias, George, Chen, Zhitang

Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning

arXiv.org Machine LearningJun-5-2018

Exogenous state variables and rewards can slow down reinforcement learning by injecting uncontrolled variation into the reward signal. We formalize exogenous state variables and rewards and identify conditions under which an MDP with exogenous state can be decomposed into an exogenous Markov Reward Process involving only the exogenous state reward and an endogenous Markov Decision Process defined with respect to only the endogenous rewards. We also derive a variance-covariance condition under which Monte Carlo policy evaluation on the endogenous MDP is accelerated compared to using the full MDP. Similar speedups are likely to carry over to all RL algorithms. We develop two algorithms for discovering the exogenous variables and test them on several MDPs. Results show that the algorithms are practical and can significantly speed up reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1806.01584

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Telecommunications (0.68)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-4-2018

Nearly optimal exploration-exploitation decision thresholds

Dimitrakakis, Christos

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. In this paper, we first derive upper bounds for the utility of selecting different actions in the multi-armed bandit setting. Unlike the common statistical upper confidence bounds, these explicitly link the planning horizon, uncertainty and the need for exploration explicit. The resulting algorithm can be seen as a generalisation of the classical Thompson sampling algorithm. We experimentally test these algorithms, as well as $\epsilon$-greedy and the value of perfect information heuristics. Finally, we also introduce the idea of bagging for reinforcement learning. By employing a version of online bootstrapping, we can efficiently sample from an approximate posterior distribution.

artificial intelligence, exploration, upstream oil & gas, (17 more...)

cs/0604010

Country:

North America > United States > Virginia (0.14)
North America > Canada (0.14)
Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Data Science > Data Mining > Big Data (0.51)