AITopics

1812.06401

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.94)
(3 more...)

Paul, Sujoy, van Baar, Jeroen

Trajectory-based Learning for Ball-in-Maze Games

arXiv.org Machine LearningDec-15-2018

Deep Reinforcement Learning has shown tremendous success in solving several games and tasks in robotics. However, unlike humans, it generally requires a lot of training instances. Trajectories imitating to solve the task at hand can help to increase sample-efficiency of deep RL methods. In this paper, we present a simple approach to use such trajectories, applied to the challenging Ball-in-Maze Games, recently introduced in the literature. We show that in spite of not using human-generated trajectories and just using the simulator as a model to generate a limited number of trajectories, we can get a speed-up of about 2-3x in the learning process. We also discuss some challenges we observed while using trajectory-based learning for very sparse reward functions.

machine learning, reinforcement learning, trajectory, (20 more...)

arXiv.org Machine Learning

1811.11441

Country: North America > United States > California > Riverside County > Riverside (0.14)

Genre: Research Report (0.41)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

#artificialintelligenceDec-14-2018, 14:28:20 GMT

Beginner's guide to Reinforcement Learning & its implementation in Python

One of the most fundamental question for scientists across the globe has been – "How to learn a new skill?". The desire to understand the answer is obvious – if we can understand this, we can enable human species to do things we might not have thought before. Alternately, we can train machines to do more "human" tasks and create true artificial intelligence. While we don't have a complete answer to the above question yet, there are a few things which are clear. Irrespective of the skill, we first learn by interacting with the environment.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Tamatsukuri, Akihiro, Takahashi, Tatsuji

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing} strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing ($RS$) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the $K$-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that $RS$ is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of $RS$ is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of $RS$ with that of other representative algorithms for the $K$-armed bandit problems.

artificial intelligence, health & medicine, probability, (20 more...)

1812.05795

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.46)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

IRLAS: Inverse Reinforcement Learning for Architecture Search

Guo, Minghao, Zhong, Zhao, Wu, Wei, Lin, Dahua, Yan, Junjie

In this paper, we propose an inverse reinforcement learning method for architecture search (IRLAS), which trains an agent to learn to search network structures that are topologically inspired by human-designed network. Most existing architecture search approaches totally neglect the topological characteristics of architectures, which results in complicated architecture with a high inference latency. Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt). To avoid raising a too strong prior over the search space, we introduce inverse reinforcement learning to train the mirror stimuli function and exploit it as a heuristic guidance for architecture search, easily generalized to different architecture search algorithms. On CIFAR-10, the best architecture searched by our proposed IRLAS achieves 2.60% error rate. For ImageNet mobile setting, our model achieves a state-of-the-art top-1 accuracy 75.28%, while being 2~4x faster than most auto-generated architectures. A fast version of this model achieves 10% faster than MobileNetV2, while maintaining a higher accuracy.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1812.05285

Country: North America (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles

Jang, Kathy, Beaver, Logan, Chalaki, Behdad, Remer, Ben, Vinitsky, Eugene, Malikopoulos, Andreas, Bayen, Alexandre

Using deep reinforcement learning, we train control policies for autonomous vehicles leading a platoon of vehicles onto a roundabout. Using Flow, a library for deep reinforcement learning in micro-simulators, we train two policies, one policy with noise injected into the state and action space and one without any injected noise. In simulation, the autonomous vehicle learns an emergent metering behavior for both policies in which it slows to allow for smoother merging. We then directly transfer this policy without any tuning to the University of Delaware Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of both policies on the scaled city. We show that the noise-free policy winds up crashing and only occasionally metering. However, the noise-injected policy consistently performs the metering behavior and remains collision-free, suggesting that the noise helps with the zero-shot policy transfer. Additionally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the controllers can be found at https://sites.google.com/view/iccps-policy-transfer.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

1812.0612

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Castro, Pablo Samuel, Moitra, Subhodeep, Gelada, Carles, Kumar, Saurabh, Bellemare, Marc G.

Dopamine: A Research Framework for Deep Reinforcement Learning

Deep reinforcement learning (deep RL) research has grown significantly in recent years. A number of software offerings now exist that provide stable, comprehensive implementations for benchmarking. At the same time, recent deep RL research has become more diverse in its goals. In this paper we introduce Dopamine, a new research framework for deep RL that aims to support some of that diversity. Dopamine is open-source, TensorFlow-based, and provides compact and reliable implementations of some state-of-the-art deep RL agents. We complement this offering with a taxonomy of the different research objectives in deep RL research. While by no means exhaustive, our analysis highlights the heterogeneity of research in the field, and the value of frameworks such as ours.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1812.0611

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceDec-13-2018, 20:34:32 GMT

Reinforcement Learning 10: Classic Games Case Study

Get YouTube without the ads. Want to watch this again later? Sign in to add this video to a playlist. Report Need to report the video? Sign in to report inappropriate content.

artificial intelligence, machine learning, social media, (12 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Chess (0.77)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Shani, Lior, Efroni, Yonathan, Mannor, Shie

Revisiting Exploration-Conscious Reinforcement Learning

arXiv.org Machine LearningDec-13-2018

The objective of Reinforcement Learning is to learn an optimal policy by performing actions and observing their long term consequences. Unfortunately, acquiring such a policy can be a hard task. More severely, since one cannot tell if a policy is optimal, there is a constant need for exploration. This is known as the Exploration-Exploitation trade-off. In practice, this trade-off is resolved by using some inherent exploration mechanism, such as the $\epsilon$-greedy exploration, while still trying to learn the optimal policy. In this work, we take a different approach. We define a surrogate optimality objective: an optimal policy with respect to the exploration scheme. As we show throughout the paper, although solving this criterion does not necessarily lead to an optimal policy, the problem becomes easier to solve. We continue by analyzing this notion of optimality, devise algorithms derived from this approach, which reveal connections to existing work, and test them empirically on tabular and deep Reinforcement Learning domains.

artificial intelligence, optimal policy, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

1812.05551

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-12-2018

Soft Actor-Critic Algorithms and Applications

Haarnoja, Tuomas, Zhou, Aurick, Hartikainen, Kristian, Tucker, George, Ha, Sehoon, Tan, Jie, Kumar, Vikash, Zhu, Henry, Gupta, Abhishek, Abbeel, Pieter, Levine, Sergey

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. In this framework, the actor aims to simultaneously maximize expected return and entropy. That is, to succeed at the task while acting as randomly as possible. We extend SAC to incorporate a number of modifications that accelerate training and improve stability with respect to the hyperparameters, including a constrained formulation that automatically tunes the temperature hyperparameter. We systematically evaluate SAC on a range of benchmark tasks, as well as real-world challenging tasks such as locomotion for a quadrupedal robot and robotic manipulation with a dexterous hand. With these improvements, SAC achieves state-of-the-art performance, outperforming prior on-policy and off-policy methods in sample-efficiency and asymptotic performance. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving similar performance across different random seeds. These results suggest that SAC is a promising candidate for learning in real-world robotics tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1812.05905

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)