AITopics

1811.06187

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Transportation (0.67)
Information Technology > Robotics & Automation (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-14-2018

Natural Environment Benchmarks for Reinforcement Learning

Zhang, Amy, Wu, Yuxin, Pineau, Joelle

While current benchmark reinforcement learning (RL) tasks have been useful to drive progress in the field, they are in many ways poor substitutes for learning with real-world data. By testing increasingly complex RL algorithms on low-complexity simulation environments, we often end up with brittle RL policies that generalize poorly beyond the very specific domain. To combat this, we propose three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition. The proposed domains also permit a characterization of generalization through fair train/test separation, and easy comparison and replication of results. Through this work, we challenge the RL research community to develop more robust algorithms that meet high standards of evaluation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1811.06032

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceNov-14-2018

Orthogonal Policy Gradient and Autonomous Driving Application

Luo, Mincong, Tong, Yin, Liu, Jiachi

Abstract--One less addressed issue of deep reinforcement learning is the lack of generalization capability based on new state and new target, for complex tasks, it is necessary to give the correct strategy and evaluate all possible actions for current state. Fortunately, deep reinforcement learning has enabled enormous progress in both subproblems: giving the correct strategy and evaluating all actions based on the state. In this paper we present an approach called orthogonal policy gradient descent(OPGD) that can make agent learn the policy gradient based on the current state and the actions set, by which the agent can learn a policy network with generalization capability. The framework of the proposed method to implement the autonomous driving. In this paper we proposed a deep reinforcement learning(DRL) method called orthogonal policy gradient descent, which is prooved that the global optimization objective function can reach maximum value and is used in the application of autonomous driving.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

1811.06151

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.96)
Automobiles & Trucks (0.96)
Information Technology > Robotics & Automation (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Behzadan, Vahid, Yampolskiy, Roman V., Munir, Arslan

Emergence of Addictive Behaviors in Reinforcement Learning Agents

This paper presents a novel approach to the technical analysis of wireheading in intelligent agents. Inspired by the natural analogues of wireheading and their prevalent manifestations, we propose the modeling of such phenomenon in Reinforcement Learning (RL) agents as psychological disorders. In a preliminary step towards evaluating this proposal, we study the feasibility and dynamics of emergent addictive policies in Q-learning agents in the tractable environment of the game of Snake. We consider a slightly modified settings for this game, in which the environment provides a "drug" seed alongside the original "healthy" seed for the consumption of the snake. We adopt and extend an RL-based model of natural addiction to Q-learning agents in this settings, and derive sufficient parametric conditions for the emergence of addictive behaviors in such agents. Furthermore, we evaluate our theoretical analysis with three sets of simulation-based experiments. The results demonstrate the feasibility of addictive wireheading in RL agents, and provide promising venues of further research on the psychopathological modeling of complex AI safety problems.

addictive behavior, agent, experiment, (13 more...)

1811.0559

Country:

North America > United States > Kentucky > Jefferson County > Louisville (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tu, Zhuozhuo, Zhang, Jingwei, Tao, Dacheng

Theoretical Analysis of Adversarial Learning: A Minimax Approach

arXiv.org Machine LearningNov-13-2018

We propose a general theoretical method for analyzing the risk bound in the presence of adversaries. In particular, we try to fit the adversarial learning problem into the minimax framework. We first show that the original adversarial learning problem could be reduced to a minimax statistical learning problem by introducing a transport map between distributions. Then we prove a risk bound for this minimax problem in terms of covering numbers. In contrast to previous minimax bounds in \cite{lee,far}, our bound is informative when the radius of the ambiguity set is small. Our method could be applied to multi-class classification problems and commonly-used loss functions such as hinge loss and ramp loss. As two illustrative examples, we derive the adversarial risk bounds for kernel-SVM and deep neural networks. Our results indicate that a stronger adversary might have a negative impact on the complexity of the hypothesis class and the existence of margin could serve as a defense mechanism to counter adversarial attacks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1811.05232

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education > Focused Education > Special Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningNov-13-2018

Diversity-Driven Extensible Hierarchical Reinforcement Learning

Song, Yuhang, Wang, Jianyi, Lukasiewicz, Thomas, Xu, Zhenghua, Xu, Mai

Hierarchical reinforcement learning (HRL) has recently shown promising advances on speeding up learning, improving the exploration, and discovering intertask transferable skills. Most recent works focus on HRL with two levels, i.e., a master policy manipulates subpolicies, which in turn manipulate primitive actions. However, HRL with multiple levels is usually needed in many real-world scenarios, whose ultimate goals are highly abstract, while their actions are very primitive. Therefore, in this paper, we propose a diversity-driven extensible HRL (DEHRL), where an extensible and scalable framework is built and learned levelwise to realize HRL with multiple levels. DEHRL follows a popular assumption: diverse subpolicies are useful, i.e., subpolicies are believed to be more useful if they are more diverse. However, existing implementations of this diversity assumption usually have their own drawbacks, which makes them inapplicable to HRL with multiple levels. Consequently, we further propose a novel diversity-driven solution to achieve this assumption in DEHRL. Experimental studies evaluate DEHRL with five baselines from four perspectives in two domains; the results show that DEHRL outperforms the state-of-the-art baselines in all four aspects.

machine learning, reinforcement learning, subpolicy, (17 more...)

arXiv.org Machine Learning

1811.04324

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Fellows, Matthew, Mahajan, Anuj, Rudner, Tim G. J., Whiteson, Shimon

VIREL: A Variational Inference Framework for Reinforcement Learning

arXiv.org Machine LearningNov-13-2018

Applying probabilistic models to reinforcement learning (RL) has become an exciting direction of research owing to powerful optimisation tools such as variational inference becoming applicable to RL. However, due to their formulation, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, for example, the absence of mode capturing behaviour in pseudo-likelihood methods and difficulties in optimisation of learning objective in maximum entropy RL based approaches. We propose VIREL, a novel, theoretically grounded probabilistic inference framework for RL that utilises the action-value function in a parametrised form to capture future dynamics of the underlying Markov decision process. Owing to its generality, our framework lends itself to current advances in variational inference. Applying the variational expectation-maximisation algorithm to our framework, we show that the actor-critic algorithm can be reduced to expectation-maximisation. We derive a family of methods from our framework, including state-of-the-art methods based on soft value functions. We evaluate two actor-critic algorithms derived from this family, which perform on par with soft actor critic, demonstrating that our framework offers a promising perspective on RL as inference.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1811.01132

Country:

North America > United States > New York (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Behzadan, Vahid, Minton, James, Munir, Arslan

TrolleyMod v1.0: An Open-Source Simulation and Data-Collection Platform for Ethical Decision Making in Autonomous Vehicles

This paper presents TrolleyMod v1.0, an open-source platform based on the CARLA simulator for the collection of ethical decision-making data for autonomous vehicles. This platform is designed to facilitate experiments aiming to observe and record human decisions and actions in high-fidelity simulations of ethical dilemmas that occur in the context of driving. Targeting experiments in the class of trolley problems, TrolleyMod provides a seamless approach to creating new experimental settings and environments with the realistic physics-engine and the high-quality graphical capabilities of CARLA and the Unreal Engine. Also, TrolleyMod provides a straightforward interface between the CARLA environment and Python to enable the implementation of custom controllers, such as deep reinforcement learning agents. The results of such experiments can be used for sociological analyses, as well as the training and tuning of value-aligned autonomous vehicles based on social values that are inferred from observations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1811.05594

Country: North America > United States > Kansas (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Deep Q learning for fooling neural networks

Kulkarni, Mandar

Deep learning models are vulnerable to external attacks. In this paper, we propose a Reinforcement Learning (RL) based approach to generate adversarial examples for the pre-trained (target) models. We assume a semi black-box setting where the only access an adversary has to the target model is the class probabilities obtained for the input queries. We train a Deep Q Network (DQN) agent which, with experience, learns to attack only a small portion of image pixels to generate non-targeted adversarial images. Initially, an agent explores an environment by sequentially modifying random sets of image pixels and observes its effect on the class probabilities. At the end of an episode, it receives a positive (negative) reward if it succeeds (fails) to alter the label of the image. Experimental results with MNIST, CIFAR-10 and Imagenet datasets demonstrate that our RL framework is able to learn an effective attack policy.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1811.05521

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Furuta, Ryosuke, Inoue, Naoto, Yamasaki, Toshihiko

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. We also propose an effective learning method for pixelRL that significantly improves the performance by considering not only the future states of the own pixel but also those of the neighbor pixels. The proposed method can be applied to some image processing tasks that require pixel-wise manipulations, where deep RL has never been applied. We apply the proposed method to three image processing tasks: image denoising, image restoration, and local color enhancement. Our experimental results demonstrate that the proposed method achieves comparable or better performance, compared with the state-of-the-art methods based on supervised learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1811.04323

Country: Asia > Japan (0.28)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)