AITopics

1810.02927

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

von Pilchau, Wenzel Baron Pilar, Stein, Anthony, Hähner, Jörg

Bootstrapping a DQN Replay Memory with Synthetic Experiences

arXiv.org Machine LearningFeb-4-2020

An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples bear great potential in form of knowledge about the problem that can be extracted. We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner. The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version.

algorithm, interpolation, transition, (15 more...)

2002.0137

Country:

North America > United States (0.04)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

arXiv.org Machine LearningFeb-4-2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Kaledin, Maxim, Moulines, Eric, Naumov, Alexey, Tadic, Vladislav, Wai, Hoi-To

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is o (1 /k c) and the steady-state term is O (1 /k), where c 1 and k is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of Ω(1 /k). A simple numerical experiment is presented to support our theory. Keywords: stochastic approximation, reinforcement learning, GTD learning, Markovian noise 1. Introduction Since its introduction close to 70 years ago, the stochastic approximation (SA) scheme (Robbins and Monro, 1951) has been a powerful tool for root finding when only noisy samples are available. During the past two decades, considerable progresses in the practical and theoretical research of SA have been made, see (Bena ım, 1999; Kushner and Yin, 2003; Borkar, 2008) for an overview. Among others, linear SA schemes are popular in reinforcement learning (RL) as they lead to policy evaluation methods with linear function approximation, of particular importance is temporal difference (TD) learning (Sutton, 1988) for which finite time analysis has been reported in (Srikant and Ying, 2019; Lakshminarayanan and Szepesvari, 2018; Bhandari et al., 2018; Dalal et al., 2018a). The TD learning scheme based on classical (linear) SA is known to be inadequate for the off-policy learning paradigms in RL, where data samples are drawn from a behavior policy different from the policy being evaluated (Baird, 1995; Tsitsiklis and V an Roy, 1997). To circumvent this Authors listed in alphabetical order. These methods fall within the scope of linear two-timescale SA scheme introduced by Borkar (1997): θ k 1 θ k β k{null b 1( X k 1) null A 11(X k 1)θ k null A 12(X k 1) w k}, (1) w k 1 w k γ k{null b 2( X k 1) null A 21( X k 1)θ k null A 22(X k 1)w k}.

inequality, timescale sa, tochastic, (14 more...)

2002.01268

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceFeb-3-2020, 23:36:15 GMT

r/MachineLearning - [R] Rotation, Translation, and Cropping for Zero-Shot Generalization

Abstract: Deep Reinforcement Learning (DRL) has shown impressive performance on domains with visual inputs, in particular various games. However, the agent is usually trained on a fixed environment, e.g. a fixed number of levels. A growing mass of evidence suggests that these trained models fail to generalize to even slight variations of the environments they were trained on. This paper advances the hypothesis that the lack of generalization is partly due to the input representation, and explores how rotation, cropping and translation could increase generality. We show that a cropped, translated and rotated observation can get better generalization on unseen levels of a two- dimensional arcade game. The generality of the agent is evaluated on a set of human-designed levels.

machinelearning, translation, zero-shot generalization, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

#artificialintelligenceFeb-3-2020, 23:29:58 GMT

r/MachineLearning - [R] Procedural Content Generation via Reinforcement Learning

Abstract: We investigate how reinforcement learning can be used to train level-designing agents. This represents a new approach to procedural content generation in games, where level design is framed as a game, and the content generator itself is learned. By seeing the design problem as a sequential task, we can use reinforcement learning to learn how to take the next action so that the expected final level quality is maximized. This approach can be used when few or no examples exist to train from, and the trained generator is very fast. We investigate three different ways of transforming two-dimensional level design problems into Markov decision processes and apply these to three game environments.

machinelearning, procedural content generation, reinforcement learning, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Karimpanal, Thommen George

Neuro-evolutionary Frameworks for Generalized Learning Agents

arXiv.org Artificial IntelligenceFeb-3-2020

The ultimate aim of artificial intelligence research is to develop agents with truly intelligent behaviors, akin to those found in humans and animals. To this end, a number of tools and techniques have been developed. In recent years, two approaches in particular - deep learning (DL) and reinforcement learning (RL), seem to have made considerable progress towards this goal. Both these fields have been widely studied, with numerous successful examples [22, 29, 42, 25, 40] reported, particularly in recent years. However, even with the unprecedented success of recent approaches such as deep RL [28, 27, 36], poor sample efficiency and limited generalization remain major concerns to be addressed, keeping in view the ultimate goal of developing general purpose agents. The poor generalization capability of DL is exposed by its liability to deception when presented with adversarial examples [30, 39]. Recent work [38], showed that it was possible to hurt the performance of DLbased image recognition systems by carefully altering just a single pixel.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

2002.01088

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Sreedharan, Sarath, Soni, Utkash, Verma, Mudit, Srivastava, Siddharth, Kambhampati, Subbarao

Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators

arXiv.org Artificial IntelligenceFeb-3-2020

As more and more complex AI systems are introduced into our day-to-day lives, it becomes important that everyday users can work and interact with such systems with relative ease. Orchestrating such interactions require the system to be capable of providing explanations and rationale for its decisions and be able to field queries about alternative decisions. A significant hurdle to allowing for such explanatory dialogue could be the mismatch between the complex representations that the systems use to reason about the task and the terms in which the user may be viewing the task. This paper introduces methods that can be leveraged to provide contrastive explanations in terms of user-specified concepts for deterministic sequential decision-making settings where the system dynamics may be best represented in terms of black box simulators. We do this by assuming that system dynamics can at least be partly captured in terms of symbolic planning models, and we provide explanations in terms of these models. We implement this method using a simulator for a popular Atari game (Montezuma's Revenge) and perform user studies to verify whether people would find explanations generated in this form useful.

classifier, explanation, precondition, (14 more...)

2002.0108

Country:

Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)

Genre:

Questionnaire & Opinion Survey (0.55)
Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Transportation > Air (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Chen, Zaiwei, Maguluri, Siva Theja, Shakkottai, Sanjay, Shanmugam, Karthikeyan

Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes

arXiv.org Machine LearningFeb-3-2020

Stochastic Approximation (SA) is a popular approach for solving fixed point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample bound for using either constant or diminishing step sizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA are contracting in expectation with respect to that Lyapunov function. The result is applicable to various Reinforcement Learning (RL) algorithms. In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for the off-policy TD-Learning [15], and improve the existing bound for the tabular $Q$-Learning algorithm. Further, for these two applications, our construction of the Lyapunov functions results in only a logarithmic dependence of the convergence bound on the state-space dimension.

algorithm, assumption 2, contraction, (16 more...)

2002.00874

Country:

North America > United States > Texas (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kiran, B Ravi, Sobh, Ibrahim, Talpaert, Victor, Mannion, Patrick, Sallab, Ahmad A. Al, Yogamani, Senthil, Pérez, Patrick

Deep Reinforcement Learning for Autonomous Driving: A Survey

arXiv.org Artificial IntelligenceFeb-2-2020

With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms, provides a taxonomy of automated driving tasks where (D)RL methods have been employed, highlights the key challenges algorithmically as well as in terms of deployment of real world autonomous driving agents, the role of simulators in training agents, and finally methods to evaluate, test and robustifying existing solutions in RL and imitation learning.

agent, learning, reinforcement, (13 more...)

2002.00444

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(9 more...)

Genre: Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Yurtsever, Ekim, Capito, Linda, Redmill, Keith, Ozguner, Umit

Integrating Deep Reinforcement Learning with Model-based Path Planners for Automated Driving

arXiv.org Artificial IntelligenceFeb-2-2020

Automated driving in urban settings is challenging chiefly due to the indeterministic nature of the human participants of the traffic. These behaviors are difficult to model, and conventional, rule-based Automated Driving Systems (ADSs) tend to fail when they face unmodeled dynamics. On the other hand, the more recent, end-to-end Deep Reinforcement Learning (DRL) based ADSs have shown promising results. However, pure learning-based approaches lack the hard-coded safety measures of model-based methods. Here we propose a hybrid approach that integrates a model-based path planner into a vision based DRL framework to alleviate the shortcomings of both worlds. In summary, the DRL agent learns to overrule the model-based planner's decisions if it predicts that better future rewards can be obtained while doing so, e.g., avoiding an accident. Otherwise, the DRL agent tends to follow the model-based planner as close as possible. This logic is learned, i.e., no switching model is designed here. The agent learns this by considering two penalties: the penalty of straying away from the model-based path planner and the penalty of having a collision. The latter has precedence over the former, i.e., the penalty is greater. Therefore, after training, the agent learns to follow the model-based planner when it is safe to do so, otherwise, it gets penalized. However, it also learns to sacrifice positive rewards for following the model-based planner to avoid a potential big negative penalty for making a collision in the future. Experimental results show that the proposed method can plan its path and navigate while avoiding obstacles between randomly chosen origin-destination points in CARLA, a dynamic urban simulation environment. Our code is open-source and available online.

agent, path planner, waypoint, (14 more...)

2002.00434

Country: North America > United States > Ohio (0.05)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)