AITopics

Country: North America > United States (0.25)

Industry:

Energy (1.00)
Education > Educational Setting > Higher Education (0.93)
Transportation > Ground > Road (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

#artificialintelligenceOct-30-2018, 00:31:28 GMT

Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning Analytikus - Simplifying Data

Supervised vs Reinforcement Learning: In Supervised Learning we have an external supervisor who has sufficient knowledge of the environment and also shares the learning with a supervisor to form a better understanding and complete the task, but since we have problems where the agent can perform so many different kind of subtasks by itself to achieve the overall objective, the presence of a supervisor is unnecessary and impractical. We can take up the example of a chess game, where the player can play tens of thousands of moves to achieve the ultimate objective. Creating a knowledge base for this purpose can be a really complicated task. Thus, it is imperative that in such tasks, the computer learn how to manage affairs by itself. It is hence more feasible and pertinent for the machine to learn from its own experience. Once the machine has started learning from its own experience, it can then gain knowledge from these experiences to implement in the future moves.

artificial intelligence, machine learning explained, reinforcement learning analytikus, (7 more...)

Industry: Leisure & Entertainment > Games > Chess (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

arXiv.org Artificial IntelligenceOct-30-2018

Exploration by Random Network Distillation

Burda, Yuri, Edwards, Harrison, Storkey, Amos, Klimov, Oleg

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1810.12894

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceOct-30-2018

Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving

Li, Dong, Zhao, Dongbin, Zhang, Qichao, Chen, Yaran

Abstract--This paper investigates the vision-based autonomous driving with deep learning and reinforcement learning methods. Different from the end-to-end learning method, our method breaks the vision-based lateral control system down into a perception module and a control module. The perception module which is based on a multi-task learning neural network first takes a driver-view image as its input and predicts the track features. The control module which is based on reinforcement learning then makes a control decision based on these features. In order to improve the data efficiency, we propose visual TORCS (VTORCS), a deep reinforcement learning environment which is based on the open racing car simulator (TORCS). By means of the provided functions, one can train an agent with the input of an image or various physical sensor measurement, or evaluate the perception algorithm on this simulator. The trained reinforcement learning controller outperforms the linear quadratic regulator (LQR) controller and model predictive control (MPC) controller on different tracks. The experiments demonstrate that the perception module shows promising performance and the controller is capable of controlling the vehicle drive well along the track center with visual input. N recent years, artificial intelligence (AI) has flourished in many fields such as autonomous driving [1] [2], games [3] [4], and engineering applications [5] [6]. As one of the most popular topics, autonomous driving has drawn great attention both from the academic and industrial communities and is thought to be the next revolution in the intelligent transportation system. The autonomous driving system mainly consists of four modules: an environment perception module, a trajectory planning module, a control module, and an actuator mechanism module. The initial perception methods [7] [8] are based on the expensive LIDARs which usually cost tens of thousands of dollars. The high cost limits their large-scale applications to the ordinary vehicles. Recently, more attention is paid to the image-based methods [9] of which the core sensor, i.e. camera is relatively cheap and already equipped on most vehicles. Some of these perception methods have been developed into products [10] [11]. In this paper, we focus on the lateral control problem based on the image captured by the onboard camera.

controller, machine learning, reinforcement learning, (18 more...)

1810.12778

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Humayoo, Mahammad, Cheng, Xueqi

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

arXiv.org Artificial IntelligenceOct-30-2018

Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions. The discrepancy between $\pi$ and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter $\beta\in[0, 1]$ which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy ($\pi$), and behavior policy. We use action value generated from the behavior policy to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1810.12558

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

#artificialintelligenceOct-29-2018, 18:49:14 GMT

A (Long) Peek into Reinforcement Learning

Mastering the game of Go with deep neural networks and tree search.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Industry: Leisure & Entertainment > Games > Go (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceOct-29-2018, 15:16:25 GMT

Machine learning to optimize traffic and reduce pollution

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: North America > United States (0.25)

Industry:

Energy (1.00)
Education > Educational Setting > Higher Education (0.93)
Transportation > Ground > Road (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

arXiv.org Artificial IntelligenceOct-29-2018

Learning to Teach with Dynamic Loss Functions

Wu, Lijun, Tian, Fei, Xia, Yingce, Fan, Yang, Qin, Tao, Lai, Jianhuang, Liu, Tie-Yan

Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. A good teacher not only provides his/her students with qualified teaching materials (e.g., textbooks), but also sets up appropriate learning objectives (e.g., course projects and exams) considering different situations of a student. When it comes to artificial intelligence, treating machine learning models as students, the loss functions that are optimized act as perfect counterparts of the learning objective set by the teacher. In this work, we explore the possibility of imitating human teaching behaviors by dynamically and automatically outputting appropriate loss functions to train machine learning models. Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher). The ultimate goal of teacher model is cultivating the student to have better performance measured on development dataset. Towards that end, similar to human teaching, the teacher, a parametric model, dynamically outputs different loss functions that will be used and optimized by its student model at different training stages. We develop an efficient learning method for the teacher model that makes gradient based optimization possible, exempt of the ineffective solutions such as policy optimization. We name our method as "learning to teach with dynamic loss functions" (L2T-DLF for short). Extensive experiments on real world tasks including image classification and neural machine translation demonstrate that our method significantly improves the quality of various student models.

machine learning, reinforcement learning, student model, (16 more...)

1810.12081

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Education > Educational Technology > Educational Software (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

arXiv.org Artificial IntelligenceOct-29-2018

Assessing Generalization in Deep Reinforcement Learning

Packer, Charles, Gao, Katelyn, Kos, Jernej, Krähenbühl, Philipp, Koltun, Vladlen, Song, Dawn

Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but has been shown to be sensitive to system changes at test time. As a result, building deep RL agents that generalize has become an active research area. Our aim is to catalyze and streamline community-wide progress on this problem by providing the first benchmark and a common experimental protocol for investigating generalization in RL. Our benchmark contains a diverse set of environments and our evaluation methodology covers both in-distribution and out-of-distribution generalization. To provide a set of baselines for future research, we conduct a systematic evaluation of deep RL algorithms, including those that specifically tackle the problem of generalization.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1810.12282

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2018

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Liu, Qiang, Li, Lihong, Tang, Ziyang, Zhou, Dengyong

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1810.12429

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.84)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)