AITopics

Genre:

Instructional Material > Course Syllabus & Notes (0.52)
Instructional Material > Online (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.41)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

arXiv.org Machine LearningApr-10-2018

Universal Successor Representations for Transfer Reinforcement Learning

Ma, Chen, Wen, Junfeng, Bengio, Yoshua

The objective of transfer reinforcement learning is to generalize from a set of previous tasks to unseen new tasks. In this work, we focus on the transfer scenario where the dynamics among tasks are the same, but their goals differ. Although general value function (Sutton et al., 2011) has been shown to be useful for knowledge transfer, learning a universal value function can be challenging in practice. To attack this, we propose (1) to use universal successor representations (USR) to represent the transferable knowledge and (2) a USR approximator (USRA) that can be trained by interacting with the environment. Our experiments show that USR can be effectively applied to new tasks, and the agent initialized by the trained USRA can achieve the goal considerably faster than random initialization.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1804.03758

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kearney, Alex, Veeriah, Vivek, Travnik, Jaden B., Sutton, Richard S., Pilarski, Patrick M.

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

arXiv.org Machine LearningApr-10-2018

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)---a vectorized adaptive step-size method for supervised learning---to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1804.03334

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.30)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

@machinelearnbotApr-9-2018, 05:20:10 GMT

Advanced Neural Networks with Tensorflow Udemy

Neural Networks are at the forefront of almost all recent major technology breakthroughs. The intersection of big data, parallel programming, and AI generated a new wave of Neural Network research. In this course, you will be taken through some of the best uses of Neural Networks using TensorFlow. You'll explore Deep Reinforcement Learning algorithms such as Generative Networks and Deep Q Learning. You will learn to implement some more complex types of neural networks such as Deep Q Learning with OpenAI Gym, autoencoders, and Siamese neural networks.

advanced neural network, neural network, tensorflow udemy, (3 more...)

Country: Europe > Netherlands > South Holland > Delft (0.07)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Haarnoja, Tuomas, Hartikainen, Kristian, Abbeel, Pieter, Levine, Sergey

Latent Space Policies for Hierarchical Reinforcement Learning

arXiv.org Machine LearningApr-9-2018

We address the problem of learning hierarchical deep neural network policies for reinforcement learning. Our aim is to design a hierarchical reinforcement learning algorithm that can construct hierarchical representations in bottom-up layerwise fashion. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1804.02808

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

@machinelearnbotApr-8-2018, 16:39:14 GMT

Artificial Intelligence IV - Reinforcement Learning in Java

This course is about Reinforcement Learning. The first step is to talk about the mathematical background: we can use a Markov Decision Process as a model for reinforcement learning. We can solve the problem 3 ways: value-iteration, policy-iteration and Q-learning. Q-learning is a model free approach so it is state-of-the-art approach. It learns the optimal policy by interacting with the environment.

artificial intelligence iv, q-learning, reinforcement learning, (1 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.83)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceApr-8-2018, 01:12:56 GMT

Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents

After a weeklong break, I am back again with part 2 of my Reinforcement Learning tutorial series. In Part 1, I had shown how to put together a basic agent that learns to choose the more rewarding of two possible options. In this post, I am going to describe how we get from that simple agent to one that is capable of taking in an observation of the world, and taking actions which provide the optimal reward not just in the present, but over the long run. With these additions, we will have a full reinforcement agent. Environments which pose the full problem to an agent are referred to as Markov Decision Processes (MDPs).

agent, policy-based agent, simple reinforcement learning, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.56)

@machinelearnbotApr-8-2018, 01:05:15 GMT

University of Warsaw researchers and deepsense.ai launch reinforcement learning project powered by Google's TensorFlow Research Cloud deepsense.ai Press Center

Researchers from the University of Warsaw, Google AI and deepsense.ai The goal of the experiment is to end-to-end train an artificial intelligence to play video games fully inside a computation graph. A team from the University of Warsaw, made up of Piotr Miłoś, Błażej Osiński and Henryk Michalewski, has started a collaboration on reinforcement learning research with Łukasz Kaiser from the Google Brain team and with researchers from deepsense.ai. This project is connected to a research program on RL that deepsense.ai In the experiment, an artificial intelligence will be end-to-end trained to play video games fully inside a computation graph.

deepsense, experiment, reinforcement, (12 more...)

Country: Europe > Poland > Masovia Province > Warsaw (0.87)

Industry: Information Technology > Services (0.39)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Jaleel, Hassan, Shamma, Jeff S.

Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games

arXiv.org Machine LearningApr-8-2018

Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steady-state behavior. We present the framework in the context of two learning dynamics: Log-Linear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to long-run behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics.

lll, machine learning, reinforcement learning, (18 more...)

1804.02693

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Ichimura, Takumi, Igaue, Daisuke

Hierarchical Modular Reinforcement Learning Method and Knowledge Acquisition of State-Action Rule for Multi-target Problem

arXiv.org Machine LearningApr-8-2018

Hierarchical Modular Reinforcement Learning (HMRL), consists of 2 layered learning where Profit Sharing works to plan a prey position in the higher layer and Q-learning method trains the state-actions to the target in the lower layer. In this paper, we expanded HMRL to multi-target problem to take the distance between targets to the consideration. The function, called `AT field', can estimate the interests for an agent according to the distance between 2 agents and the advantage/disadvantage of the other agent. Moreover, the knowledge related to state-action rules is extracted by C4.5. The action under the situation is decided by using the acquired knowledge. To verify the effectiveness of proposed method, some experimental results are reported.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

doi: 10.1109/IWCIA.2013.6624799

1804.02698

Country: Asia > Japan (0.15)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)