AITopics

1908.03263

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

arXiv.org Machine LearningAug-8-2019

Incremental Reinforcement Learning --- a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods

Chen, Tianhao, Cheng, Limei, Liu, Yang, Jia, Wenchuan, Ma, Shugen

Continuous reinforcement learning such as DDPG and A3C are widely used in robot control and autonomous driving. However, both methods have theoretical weaknesses. While DDPG cannot control noises in the control process, A3C does not satisfy the continuity conditions under the Gaussian policy. To address these concerns, we propose a new continues reinforcement learning method based on stochastic differential equations and we call it Incremental Reinforcement Learning (IRL). This method not only guarantees the continuity of actions within any time interval, but controls the variance of actions in the training process. In addition, our method does not assume Markov control in agents' action control and allows agents to predict scene changes for action selection. With our method, agents no longer passively adapt to the environment. Instead, they positively interact with the environment for maximum rewards.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1908.02974

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Kulhánek, Jonáš, Derner, Erik, de Bruin, Tim, Babuška, Robert

Vision-based Navigation Using Deep Reinforcement Learning

arXiv.org Artificial IntelligenceAug-8-2019

Jon a ˇ s Kulh anek 1, Erik Derner 2, Tim de Bruin 1, and Robert Babu ˇ ska 3 Abstract -- Deep reinforcement learning (RL) has been successfully applied to a variety of game-like environments. However, the application of deep RL to visual navigation with realistic environments is a challenging task. We propose a novel learning architecture capable of navigating an agent, e.g. a mobile robot, to a target given by an image. T o achieve this, we have extended the batched A2C algorithm with auxiliary tasks designed to improve visual navigation performance. We propose three additional auxiliary tasks: predicting the segmentation of the observation image and of the target image and predicting the depth-map. These tasks enable the use of supervised learning to pre-train a large part of the network and to reduce the number of training steps substantially. The training performance has been further improved by increasing the environment complexity gradually over time. An efficient neural network structure is proposed, which is capable of learning for multiple targets in multiple environments. Our method navigates in continuous state spaces and on the AI2-THOR environment simulator outperforms state-of-the-art goal-oriented visual navigation methods from the literature. I NTRODUCTION Visual navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only. We focus on the case when the environment is initially unknown, i.e., no explicit map is available.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1908.03627

Country:

Europe > Czechia (0.15)
Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Laval, Jorge A., Zhou, Hao

Large-scale traffic signal control using machine learning: some traffic flow considerations

This paper uses supervised learning, random search and deep reinforcement learning (DRL) methods to control large signalized intersection networks. The traffic model is Cellular Automaton rule 184, which has been shown to be a parameter-free representation of traffic flow, and is the most efficient implementation of the Kinematic Wave model with triangular fundamental diagram. We are interested in the steady-state performance of the system, both spatially and temporally: we consider a homogeneous grid network inscribed on a torus, which makes the network boundary-free, and drivers choose random routes. As a benchmark we use the longest-queue-first (LQF) greedy algorithm. We find that: (i) a policy trained with supervised learning with only two examples outperforms LQF, (ii) random search is able to generate near-optimal policies, (iii) the prevailing average network occupancy during training is the major determinant of the effectiveness of DRL policies. When trained under free-flow conditions one obtains DRL policies that are optimal for all traffic conditions, but this performance deteriorates as the occupancy during training increases. For occupancies > 75% during training, DRL policies perform very poorly for all traffic conditions, which means that DRL methods cannot learn under highly congested conditions. We conjecture that DRL's inability to learn under congestion might be explained by a property of urban networks found here, whereby even a very bad policy produces an intersection throughput higher than downstream capacity. This means that the actual throughput tends to be independent of the policy. Our findings imply that it is advisable for current DRL methods in the literature to discard any congested data when training, and that doing this will improve their performance under all traffic conditions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1908.02673

Genre: Research Report > New Finding (0.88)

Industry:

Consumer Products & Services > Travel (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Gil, Yolanda, Selman, Bart

A 20-Year Community Roadmap for Artificial Intelligence Research in the US

Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience. Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades.

knowledge management, machine learning, simulation of human behavior, (31 more...)

1908.02624

Country:

Europe (0.92)
North America > United States > California (0.46)
North America > United States > Maryland (0.28)
North America > United States > Colorado (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Social Sector (1.00)
(27 more...)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Data Science > Data Mining (1.00)
(21 more...)

Nikulin, Dmitry, Ianina, Anastasia, Aliev, Vladimir, Nikolenko, Sergey

Free-Lunch Saliency via Attention in Atari Agents

We propose a new approach to visualize saliency maps for deep neural network models and apply it to deep reinforcement learning agents trained on Atari environments. Our method adds an attention module that we call FLS (Free Lunch Saliency) to the feature extractor from an established baseline (Mnih et al., 2015). This addition results in a trainable model that can produce saliency maps, i.e., visualizations of the importance of different parts of the input for the agent's current decision making. We show experimentally that a network with an FLS module exhibits performance similar to the baseline (i.e., it is "free", with no performance cost) and can be used as a drop-in replacement for reinforcement learning agents. We also design another feature extractor that scores slightly lower but provides higher-fidelity visualizations. In addition to attained scores, we report saliency metrics evaluated on the Atari-HEAD dataset of human gameplay.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1908.02511

Country:

North America > United States (0.46)
Europe > Russia (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

DoorGym: A Scalable Door Opening Environment And Baseline Agent

Urakami, Yusuke, Hodgkinson, Alec, Carlin, Casey, Leu, Randall, Rigazio, Luca, Abbeel, Pieter

Reinforcement Learning (RL) has brought forth ideas of autonomous robots that can navigate real-world environments with ease, aiding humans in a variety of tasks. RL agents have just begun to make their way out of simulation into the real world. Once in the real world, benchmark tasks often fail to transfer into useful skills. We introduce DoorGym, a simulation environment intended to be the first step to move RL from toy environments towards useful atomic skills that can be composed and extended towards a broader goal. DoorGym is an open-source door simulation framework designed to be highly configurable. We also provide a baseline PPO (Proximal Policy Optimization) and SAC (Soft Actor-Critic)implementation, which achieves a success rate of up to 70% for common tasks in this environment. Environment kit available here:https://github.com/PSVL/DoorGym/

domain randomization, machine learning, reinforcement learning, (19 more...)

1908.01887

Country: North America > United States > California (0.28)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Taïga, Adrien Ali, Fedus, William, Machado, Marlos C., Courville, Aaron, Bellemare, Marc G.

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

arXiv.org Machine LearningAug-6-2019

This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than $\epsilon$-greedy exploration.

benchmarking bonus-based exploration method, exploration, ontezuma, (10 more...)

1908.02388

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.83)

Industry: Education (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

arXiv.org Artificial IntelligenceAug-6-2019

Deep Reinforcement Learning in System Optimization

Haj-Ali, Ameer, Ahmed, Nesreen K., Willke, Ted, Gonzalez, Joseph, Asanovic, Krste, Stoica, Ion

The recent advancements in deep reinforcement learning have opened new horizons and opportunities to tackle various problems in system optimization. Such problems are generally tailored to delayed, aggregated, and sequential rewards, which is an inherent behavior in the reinforcement learning setting, where an agent collects rewards while exploring and exploiting the environment to maximize the long term reward. However, in some cases, it is not clear why deep reinforcement learning is a good fit for the problem. Sometimes, it does not perform better than the state-of-the-art solutions. And in other cases, random search or greedy algorithms could outperform deep reinforcement learning. In this paper, we review, discuss, and evaluate the recent trends of using deep reinforcement learning in system optimization. We propose a set of essential metrics to guide future works in evaluating the efficacy of using deep reinforcement learning in system optimization. Our evaluation includes challenges, the types of problems, their formulation in the deep reinforcement learning setting, embedding, the model used, efficiency, and robustness. We conclude with a discussion on open challenges and potential directions for pushing further the integration of reinforcement learning in system optimization.

deep reinforcement learning, optimization, reinforcement, (9 more...)

1908.01275

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Industry:

Information Technology (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Barde, Paul, Roy, Julien, Harvey, Félix G., Nowrouzezahrai, Derek, Pal, Christopher

Promoting Coordination through Policy Regularization in Multi-Agent Reinforcement Learning

arXiv.org Machine LearningAug-6-2019

A central challenge in multi-agent reinforcement learning is the induction of coordination between agents of a team. In this work, we investigate how to promote inter-agent coordination and discuss two possible avenues based respectively on inter-agent modelling and guided synchronized sub-policies. We test each approach in four challenging continuous control tasks with sparse rewards and compare them against three variants of MADDPG, a state-of-the-art multi-agent reinforcement learning algorithm. To ensure a fair comparison, we rely on a thorough hyper-parameter selection and training methodology that allows a fixed hyper-parameter search budget for each algorithm and environment. We consequently assess both the hyper-parameter sensitivity, sample-efficiency and asymptotic performance of each learning method. Our experiments show that our proposed algorithms are more robust to the hyper-parameter choice and reliably lead to strong results.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1908.02269

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)