AITopics

1809.09318

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Julian, Kyle D., Kochenderfer, Mykel J.

Image-based Guidance of Autonomous Aircraft for Wildfire Surveillance and Prediction

arXiv.org Artificial IntelligenceOct-4-2018

Abstract-- Small unmanned aircraft can help firefighters combat wildfires by providing real-time surveillance of the growing fires. However, guiding the aircraft autonomously given only wildfire images is a challenging problem. This work models noisy images obtained from on-board cameras and proposes two approaches to filtering the wildfire images. The first approach uses a simple Kalman filter to reduce noise and update a belief map in observed areas. The second approach uses a particle filter to predict wildfire growth and uses observations to estimate uncertainties relating to wildfire expansion. The belief maps are used to train a deep reinforcement learning controller, which learns a policy to navigate the aircraft to survey the wildfire while avoiding flight directly over the fire. Simulation results show that the proposed controllers precisely guide the aircraft and accurately estimate wildfire growth, and a study of observation noise demonstrates the robustness of the particle filter approach.

aircraft, machine learning, reinforcement learning, (18 more...)

1810.02455

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.47)

Savinov, Nikolay, Raichuk, Anton, Marinier, Raphaël, Vincent, Damien, Pollefeys, Marc, Lillicrap, Timothy, Gelly, Sylvain

Episodic Curiosity through Reachability

arXiv.org Artificial IntelligenceOct-4-2018

Rewards are sparse in the real world and most today's reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself -- thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward -- making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation is compared with the observations in memory. Crucially, the comparison is done based on how many environment steps it takes to reach the current observation from those in memory -- which incorporates rich information about environment dynamics. This allows us to overcome the known "couch-potato" issues of prior work -- when the agent finds a way to instantly gratify itself by exploiting actions which lead to unpredictable consequences. We test our approach in visually rich 3D environments in VizDoomand DMLab. In VizDoom, our agent learns to successfully navigate to a distant goal at least 2 times faster than the state-of-the-art curiosity method ICM. In DMLab, our agent generalizes well to new procedurally generated levels of the game -- reaching the goal at least 2 times more frequently than ICM on test mazes with very sparse reward. Many real-world tasks have sparse rewards. For example, animals searching for food may need to go many miles without any reward from the environment. Multiple approaches were proposed to achieve better explorative policies. One way is to give a reward bonus which facilitates exploration by rewarding novel observations. The reward bonus is summed up with the original task reward and optimized by standard RL algorithms. Such an approach is motivated by neuroscience studies of animals: an animal has an ability to reward itself for something novel - the mechanism biologically built into its dopamine release system.

machine learning, r-network, reinforcement learning, (19 more...)

1810.02274

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Consumer Health (0.56)
Health & Medicine > Therapeutic Area > Neurology (0.54)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-4-2018

EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

Kim, Hyoungseok, Kim, Jaekyeom, Jeong, Yeonwoo, Levine, Sergey, Song, Hyun Oh

Policy optimization struggles when the reward feedback signal is very sparse and essentially becomes a random search algorithm until the agent accidentally stumbles upon a rewarding or the goal state. Recent works utilize intrinsic motivation to guide the exploration via generative models, predictive forward models, or more ad-hoc measures of surprise. We propose EMI, which is an exploration method that constructs embedding representation of states and actions that does not rely on generative decoding of the full observation but extracts predictive signals that can be used to guide exploration based on forward prediction in the representation space. Our experiments show the state of the art performance on challenging locomotion task with continuous control and on image-based exploration tasks with discrete actions on Atari.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

1810.01176

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

#artificialintelligenceOct-3-2018, 22:38:23 GMT

Artificial Intelligence: What Is Reinforcement Learning - A Simple Explanation & Practical Examples

Reinforcement learning is one of the most discussed, followed and contemplated topics in artificial intelligence (AI) as it has the potential to transform most businesses. In this article, I want to provide a simple guide that explains reinforcement learning and give you some practical examples of how it is used today. At the core of reinforcement learning is the concept that the optimal behavior or action is reinforced by a positive reward. Similar to toddlers learning how to walk who adjust actions based on the outcomes they experience such as taking a smaller step if the previous broad step made them fall, machines and software agents use reinforcement learning algorithms to determine the ideal behavior based upon feedback from the environment. Depending on the complexity of the problem, reinforcement learning algorithms can keep adapting to the environment over time if necessary in order to maximize the reward in the long-term.

machine learning, reinforcement, reinforcement learning, (7 more...)

#artificialintelligence

Industry:

Health & Medicine (0.55)
Information Technology (0.52)
Leisure & Entertainment > Games (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Novati, Guido, Koumoutsakos, Petros

Remember and Forget for Experience Replay

arXiv.org Machine LearningOct-3-2018

Experience replay (ER) is crucial for attaining high data-efficiency in off-policy reinforcement learning (RL). ER entails the recall of experiences obtained in past iterations to compute gradient estimates for the current policy. However, the accuracy of such updates may deteriorate when the policy diverges from past behaviors, possibly undermining the effectiveness of ER. Previous off-policy RL algorithms mitigated this issue by tuning hyper-parameters in order to abate policy changes. We propose a method for ER that relies on systematically Remembering and Forgetting past behaviors (ReF-ER). ReF-ER forgets experiences that would be too unlikely with the current policy and constrains policy changes within a trust region of the behaviors in the replay memory. We couple ReF-ER with Q-learning, deterministic policy gradient and off-policy gradient methods and we show that ReF-ER reliably improves the performance of continuous-action off-policy RL. We complement ReF-ER with a novel off-policy actor-critic algorithm (RACER) for continuous-action control. RACER employs a computationally efficient closed-form approximation of the action values and is shown to be highly competitive with state-of-the-art algorithms on benchmark problems, while being robust to large hyper-parameter variations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1807.05827

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Nagendra, Savinay, Podila, Nikhil, Ugarakhod, Rashmi, George, Koshy

Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem

arXiv.org Machine LearningOct-3-2018

Designing optimal controllers continues to be challenging as systems are becoming complex and are inherently nonlinear. The principal advantage of reinforcement learning (RL) is its ability to learn from the interaction with the environment and provide optimal control strategy. In this paper, RL is explored in the context of control of the benchmark cartpole dynamical system with no prior knowledge of the dynamics. RL algorithms such as temporal-difference, policy gradient actor-critic, and value function approximation are compared in this context with the standard LQR solution. Further, we propose a novel approach to integrate RL and swing-up controllers.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

doi: 10.1109/ICACCI.2017.8125811

1810.0194

Country: Asia > India (0.15)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.36)

Andersen, Per-Arne, Goodwin, Morten, Granmo, Ole-Christoffer

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

arXiv.org Artificial IntelligenceOct-2-2018

Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.

algorithm, computer game, upstream oil & gas, (18 more...)

1810.01112

Country:

North America > United States > California (0.28)
North America > United States > New York (0.14)
Oceania > Australia (0.14)
Europe > France (0.14)

Genre: Research Report (0.65)

Industry:

Leisure & Entertainment > Games > Computer Games (0.47)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Chaudhury, Subhajit, Kimura, Daiki, Pham, Tu-Hoa, Munawar, Asim, Tachibana, Ryuki

Video Imitation GAN: Learning control policies by imitating raw videos using generative adversarial reward estimation

arXiv.org Machine LearningOct-2-2018

Natural imitation in humans usually consists of mimicking visual demonstrations of another person by continuously refining our skills until our performance is visually akin to the expert demonstrations. In this paper, we are interested in imitation learning of artificial agents in the natural setting - acquiring motor skills by watching raw video demonstrations. Traditional methods for learning from videos rely on extracting meaningful low-dimensional features from the videos followed by a separate hand-crafted reward estimation step based on feature separation between the agent and expert. We propose an imitation learning framework from raw video demonstrations, that reduces the dependence on hand engineered reward functions, by jointly learning the feature extraction and separation estimation steps, using generative adversarial networks. Additionally, we establish the equivalence between adversarial imitation from image manifolds and low-level state distribution matching, under certain conditions. Experimental results show that our proposed imitation learning method from raw videos produces a similar performance to state-of-the-art imitation learning techniques with low-level state and action information available while outperforming existing video imitation methods. Furthermore, we show that our method can learn action policies by imitating video demonstrations available on YouTube with performance comparable to learned agents from true reward signal. Please see the video at https://youtu.be/bvNpV2Q4rOA.

demonstration, machine learning, reinforcement learning, (14 more...)

1810.01108

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Pourchot, Aloïs, Sigaud, Olivier

CEM-RL: Combining evolutionary and gradient-based methods for policy search

arXiv.org Machine LearningOct-2-2018

Deep neuroevolution and deep reinforcement learning (deep RL) algorithms are two popular approaches to policy search. The former is widely applicable and rather stable, but suffers from low sample efficiency. By contrast, the latter is more sample efficient, but the most sample efficient variants are also rather unstable and highly sensitive to hyper-parameter setting. So far, these families of methods have mostly been compared as competing tools. However, an emerging approach consists in combining them so as to get the best of both worlds. Two previously existing combinations use either a standard evolutionary algorithm or a goal exploration process together with the DDPG algorithm, a sample efficient off-policy deep RL algorithm. In this paper, we propose a different combination scheme using the simple cross-entropy method (CEM) and TD3, another off-policy deep RL algorithm which improves over DDPG. We evaluate the resulting algorithm, CEM-RL, on a set of benchmarks classically used in deep RL. We show that CEM-RL benefits from several advantages over its competitors and offers a satisfactory trade-off between performance and sample efficiency.

evolutionary algorithm, machine learning, reinforcement learning, (12 more...)

1810.01222

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)