AITopics

1812.00285

Country:

North America > Canada (0.15)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games > Computer Games (0.94)
Education > Curriculum (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Deshpande, Ameet, K, Harshavardhan P, Ravindran, Balaraman

Discovering hierarchies using Imitation Learning from hierarchy aware policies

arXiv.org Artificial IntelligenceDec-1-2018

Learning options that allow agents to exhibit temporally higher order behavior has proven to be useful in increasing exploration, reducing sample complexity and for various transfer scenarios. Deep Discovery of Options (DDO) is a generative algorithm that learns a hierarchical policy along with options directly from expert trajectories. We perform a qualitative and quantitative analysis of options inferred from DDO in different domains. To this end, we suggest different value metrics like option termination condition, hinge value function error and KL-Divergence based distance metric to compare different methods. Analyzing the termination condition of the options and number of time steps the options were run revealed that the options were terminating prematurely. We suggest modifications which can be incorporated easily and alleviates the problem of shorter options and a collapse of options to the same mode.

machine learning, reinforcement learning, trajectory, (16 more...)

1812.00225

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

#artificialintelligenceNov-30-2018, 20:46:04 GMT

Amazon tempts developers to machine learning with toy race car

The DeepRacer includes a built-in compute section featuring an Intel Atom processor, 4 GB of RAM, 32 GB of internal storage and comes loaded with Ubuntu OS, Intel OpenVINO computer vision toolkit and ROS Kinetic (robot operating system). It's been designed to get developers into reinforcement learning, a form of machine learning that uses trial and error to achieve goals and successful outcomes.

amazon tempt developer, machine learning, reinforcement learning, (2 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Sports > Motorsports (0.40)
Information Technology > Services (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.39)

Control with Distributed Deep Reinforcement Learning: Learn a Better Policy

Liu, Qihao, Liu, Xiaofeng, Cai, Guoping

Abstract: Distributed approach is a very effective method to improve training efficiency of reinforcement learning. In this paper, we propose a new heuristic distributed architecture for deep reinforcement learning (DRL) algorithm, in which a PSO based network update mechanism is adopted to speed up learning an optimal policy besides using multiple agents for parallel training. In this mechanism, the update of neural network of each agent is not only according to the training result of itself, but also affected by the optimal neural network of all agents. In order to verify the effectiveness of the proposed method, the proposed architecture is implemented on the Deep Q-Network algorithm (DQN) and the Deep Deterministic Policy Gradient algorithm (DDPG) to train several typical control problems. The training results show that the proposed method is effective. Reinforcement learning is about an agent interacting with the environment, learning an optimal policy by trial and error.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1811.10264

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Khomtchouk, Bohdan, Sudhakaran, Shyam

Modeling natural language emergence with integral transform theory and reinforcement learning

Zipf's law predicts a power-law relationship between word rank and frequency in language communication systems and has been widely reported in a variety of natural language processing applications. However, the emergence of natural language is often modeled as a function of bias between speaker and listener interests, which lacks a direct way of relating information-theoretic bias to Zipfian rank. A function of bias also serves as an unintuitive interpretation of the communicative effort exchanged between a speaker and a listener. We counter these shortcomings by proposing a novel integral transform and kernel for mapping communicative bias functions to corresponding word frequency-rank representations at any arbitrary phase transition point, resulting in a direct way to link communicative effort (modeled by speaker/listener bias) to specific vocabulary used (represented by word rank). We demonstrate the practical utility of our integral transform by showing how a change from bias to rank results in greater accuracy and performance at an image classification task for assigning word labels to images randomly subsampled from CIFAR10. We model this task as a reinforcement learning game between a speaker and listener and compare the relative impact of bias and Zipfian word rank on communicative performance (and accuracy) between the two agents.

machine learning, natural language, reinforcement learning, (17 more...)

1812.01431

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

BlockPuzzle - A Challenge in Physical Reasoning and Generalization for Robot Learning

Zhao, Yixiu, Liu, Ziyin

In this work we propose a novel task framework under which a variety of physical reasoning puzzles can be constructed using very simple rules. Under sparse reward settings, most of these tasks can be very challenging for a reinforcement learning agent to learn. We build several simple environments with this task framework in Mujoco and OpenAI gym and attempt to solve them. We are able to solve the environments by designing curricula to guide the agent in learning and using imitation learning methods to transfer knowledge from a simpler environment. This is only a first step for the task framework, and further research on how to solve the harder tasks and transfer knowledge between tasks is needed.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1812.00091

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Kartal, Bilal, Hernandez-Leal, Pablo, Taylor, Matthew E.

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1812.00045

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Lee, Xian Yeow, Balu, Aditya, Stoecklein, Daniel, Ganapathysubramanian, Baskar, Sarkar, Soumik

Flow Shape Design for Microfluidic Devices Using Deep Reinforcement Learning

arXiv.org Machine LearningNov-29-2018

Microfluidic devices are utilized to control and direct flow behavior in a wide variety of applications, particularly in medical diagnostics. A particularly popular form of microfluidics -- called inertial microfluidic flow sculpting -- involves placing a sequence of pillars to controllably deform an initial flow field into a desired one. Inertial flow sculpting can be formally defined as an inverse problem, where one identifies a sequence of pillars (chosen, with replacement, from a finite set of pillars, each of which produce a specific transformation) whose composite transformation results in a user-defined desired transformation. Endemic to most such problems in engineering, inverse problems are usually quite computationally intractable, with most traditional approaches based on search and optimization strategies. In this paper, we pose this inverse problem as a Reinforcement Learning (RL) problem. We train a DoubleDQN agent to learn from this environment. The results suggest that learning is possible using a DoubleDQN model with the success frequency reaching 90% in 200,000 episodes and the rewards converging. While most of the results are obtained by fixing a particular target flow shape to simplify the learning problem, we later demonstrate how to transfer the learning of an agent based on one target shape to another, i.e. from one design to another and thus be useful for a generic design of a flow shape.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1811.12444

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Adjodah, Dhaval, Calacci, Dan, Dubey, Abhimanyu, Krafft, Peter, Moro, Esteban, Pentland, Alex `Sandy'

How to Organize your Deep Reinforcement Learning Agents: The Importance of Communication Topology

arXiv.org Artificial IntelligenceNov-29-2018

In this empirical paper, we investigate how learning agents can be arranged in more efficient communication topologies for improved learning. This is an important problem because a common technique to improve speed and robustness of learning in deep reinforcement learning and many other machine learning algorithms is to run multiple learning agents in parallel. The standard communication architecture typically involves all agents intermittently communicating with each other (fully connected topology) or with a centralized server (star topology). Unfortunately, optimizing the topology of communication over the space of all possible graphs is a hard problem, so we borrow results from the networked optimization and collective intelligence literatures which suggest that certain families of network topologies can lead to strong improvements over fully-connected networks. We start by introducing alternative network topologies to DRL benchmark tasks under the Evolution Strategies paradigm which we call Network Evolution Strategies. We explore the relative performance of the four main graph families and observe that one such family (Erdos-Renyi random graphs) empirically outperforms all other families, including the de facto fully-connected communication topologies. Additionally, the use of alternative network topologies has a multiplicative performance effect: we observe that when 1000 learning agents are arranged in a carefully designed communication topology, they can compete with 3000 agents arranged in the de facto fully-connected topology. Overall, our work suggests that distributed machine learning algorithms would learn more efficiently if the communication topology between learning agents was optimized.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1811.12556

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-29-2018

Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk

Ma, Shuai, Yu, Jia Yuan

In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1612.02088

Country: North America > Canada (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)