AITopics

1807.02373

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

arXiv.org Machine LearningJul-6-2018

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Laterre, Alexandre, Fu, Yunguan, Jabri, Mohamed Khalil, Cohen, Alain-Sam, Kas, David, Hajjar, Karl, Dahl, Torbjorn S., Kerkeni, Amine, Beguir, Karim

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimization problems, such as the traveling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional bin packing problem show that it outperforms generic Monte Carlo tree search, heuristic algorithms and reinforcement learning algorithms not using ranked rewards.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1807.01672

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Backgammon (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Mao, Hongzi, Venkatakrishnan, Shaileshh Bojja, Schwarzkopf, Malte, Alizadeh, Mohammad

Variance Reduction for Reinforcement Learning in Input-Driven Environments

arXiv.org Machine LearningJul-6-2018

We consider reinforcement learning in input-driven environments, where an exogenous, stochastic input process affects the dynamics of the system. Input processes arise in many applications, including queuing systems, robotics control with disturbances, and object tracking. Since the state dynamics and rewards depend on the input process, the state alone provides limited information for the expected future returns. Therefore, policy gradient methods with standard state-dependent baselines suffer high variance during training. We derive a bias-free, input-dependent baseline to reduce this variance, and analytically show its benefits over state-dependent baselines. We then propose a meta-learning approach to overcome the complexity of learning a baseline that depends on a long sequence of inputs. Our experimental results show that across environments from queuing systems, computer networks, and MuJoCo robotic locomotion, input-dependent baselines consistently improve training stability and result in better eventual policies.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1807.02264

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Brenon, Alexis, Portet, François, Vacher, Michel

Arcades: A deep model for adaptive decision making in voice controlled smart-home

Smart-home is an application domain which brings together home automation and ambient intelligence to ease life of dwellers and to provide support to people in loss of autonomy. The development of smarthomes is not only a cultural and technological evolution but is also recognized as one way to address the challenges created by an aging population in developed countries [42]. If home automation is concerned with sensing (sensors, actuators, middle-ware) and low-level automation (heating control, lighting control), Ambient Intelligence should provide perception and reasoning capabilities into the smart-home ecosystem. However, although the development of smart-homes is supported by a large amount of research and industrial projects, it has not reached a large public since many challenges are still to be addressed. One of the main challenges is due to the complexity of setting up the smart-home system in case of new situations (devices, house, dwellers, after an accident, etc.).

artificial intelligence, machine learning, reinforcement learning, (20 more...)

doi: 10.1016/j.pmcj.2018.06.011

1807.0197

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(9 more...)

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Colas, Cédric, Sigaud, Olivier, Oudeyer, Pierre-Yves

How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments

Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how the number of random seeds relates to the probabilities of statistical errors. For both the t-test and the bootstrap confidence interval test, we recall theoretical guidelines to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. Finally, we discuss the influence of deviations from the assumptions usually made by statistical tests. We show that they can lead to inaccurate evaluations of statistical errors and provide guidelines to counter these negative effects. We make our code available to perform the tests.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1806.08295

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)

Pardo, Fabio, Levdik, Vitaly, Kormushev, Petar

Goal-oriented Trajectories for Efficient Exploration

Exploration is a difficult challenge in reinforcement learning and even recent state-of-the art curiosity-based methods rely on the simple epsilon-greedy strategy to generate novelty. We argue that pure random walks do not succeed to properly expand the exploration area in most environments and propose to replace single random action choices by random goals selection followed by several steps in their direction. This approach is compatible with any curiosity-based exploration and off-policy reinforcement learning agents and generates longer and safer trajectories than individual random actions. To illustrate this, we present a task-independent agent that learns to reach coordinates in screen frames and demonstrate its ability to explore with the game Super Mario Bros. improving significantly the score of a baseline DQN agent.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1807.02078

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Papoudakis, Georgios, Chatzidimitriou, Kyriakos C., Mitkas, Pericles A.

Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks

Recent developments in deep reinforcement learning have enabled the creation of agents for solving a large variety of games given a visual input. These methods have been proven successful for 2D games, like the Atari games, or for simple tasks, like navigating in mazes. It is still an open question, how to address more complex environments, in which the reward is sparse and the state space is huge. In this paper we propose a divide and conquer deep reinforcement learning solution and we test our agent in the first person shooter (FPS) game of Doom. Our work is based on previous works in deep reinforcement learning and in Doom agents. We also present how our agent is able to perform better in unknown environments compared to a state of the art reinforcement learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1807.0196

Country: Europe > Greece > Central Macedonia > Thessaloniki (0.06)

Genre: Research Report > Experimental Study (0.47)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

#artificialintelligenceJul-4-2018, 02:26:44 GMT

r/MachineLearning - [R] Learning to Drive in a Day with Deep Reinforcement Learning

Abstract: We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

De Asis, Kristopher, Sutton, Richard S.

Per-decision Multi-step Temporal Difference Learning with Control Variates

arXiv.org Machine LearningJul-4-2018

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address a bias-variance trade off between reliance on current estimates, which could be poor, and incorporating longer sampled reward sequences into the updates. Especially in the off-policy setting, where the agent aims to learn about a policy different from the one generating its behaviour, the variance in the updates can cause learning to diverge as the number of sampled rewards used in the estimates increases. In this paper, we introduce per-decision control variates for multi-step TD algorithms, and compare them to existing methods. Our results show that including the control variates can greatly improve performance on both on and off-policy multi-step temporal difference learning tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1807.0183

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJul-4-2018

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

Wang, Lu, Zhang, Wei, He, Xiaofeng, Zha, Hongyuan

Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement learning (e.g. maximizing evaluation signal which indicates cumulative reward from survival rates). However, none of these studies have considered to combine the benefits of supervised learning and reinforcement learning. In this paper, we propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework. Specifically, SRL-RNN applies an off-policy actor-critic framework to handle complex relations among multiple medications, diseases and individual characteristics. The "actor" in the framework is adjusted by both the indicator signal and evaluation signal to ensure effective prescription and low mortality. RNN is further utilized to solve the Partially-Observed Markov Decision Process (POMDP) problem due to the lack of fully observed states in real world applications. Experiments on the publicly real-world dataset, i.e., MIMIC-3, illustrate that our model can reduce the estimated mortality, while providing promising accuracy in matching doctors' prescriptions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1807.01473

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)