Goto

Collaborating Authors

 Reinforcement Learning


Dynamic Input for Deep Reinforcement Learning in Autonomous Driving

arXiv.org Machine Learning

In many real-world decision making problems, reaching an optimal decision requires taking into account a variable number of objects around the agent. Autonomous driving is a domain in which this is especially relevant, since the number of cars surrounding the agent varies considerably over time and affects the optimal action to be taken. Classical methods that process object lists can deal with this requirement. However, to take advantage of recent high-performing methods based on deep reinforcement learning in modular pipelines, special architectures are necessary. For these, a number of options exist, but a thorough comparison of the different possibilities is missing. In this paper, we elaborate limitations of fully-connected neural networks and other established approaches like convolutional and recurrent neural networks in the context of reinforcement learning problems that have to deal with variable sized inputs. We employ the structure of Deep Sets in off-policy reinforcement learning for high-level decision making, highlight their capabilities to alleviate these limitations, and show that Deep Sets not only yield the best overall performance but also offer better generalization to unseen situations than the other approaches.


Interactive Lungs Auscultation with Reinforcement Learning Agent

arXiv.org Artificial Intelligence

Lung sounds auscultation is the first and most common examination carried out by every general practitioner or family doctor. It is fast, easy and well known procedure, popularized by La ennec (Hy-acinthe, 1819), who invented the stethoscope. Nowadays, different variants of such tool can be found on the market, both analog and electronic, but regardless of the type of stethoscope, this process still is highly subjective. Indeed, an auscultation normally involves the usage of a stethoscope by a physician, thus relying on the examiner's own hearing, experience and ability to interpret psychoacoustical features. Another strong limitation of standard auscultation can be found in the stethoscope itself, since its frequency response tends to attenuate frequency components of the lung sound signal above nearly 120 Hz, leaving lower frequency bands to be analyzed and to which the human ear is not really sensitive (Sovijrvi et al., 2000) (Sarkar et al., 2015).


The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving

arXiv.org Artificial Intelligence

Many animals, and an increasing number of artificial agents, display sophisticated capabilities to perceive and manipulate objects. But human beings remain distinctive in their capacity for flexible, creative tool use -- using objects in new ways to act on the world, achieve a goal, or solve a problem. Here we introduce the "Tools" game, a simple but challenging domain for studying this behavior in human and artificial agents. Players place objects in a dynamic scene to accomplish a goal that can only be achieved if those objects interact with other scene elements in appropriate ways: for instance, launching, blocking, supporting or tipping them. Only a few attempts are permitted, requiring rapid trial-and-error learning if a solution is not found at first. We propose a "Sample, Simulate, Update" (SSUP) framework for modeling how people solve these challenges, based on exploiting rich world knowledge to sample actions that would lead to successful outcomes, simulate candidate actions before trying them out, and update beliefs about which tools and actions are best in a rapid learning loop. SSUP captures human performance well across 20 levels of the Tools game, and fits significantly better than alternate accounts based on deep reinforcement learning or learning the simulator parameters online. We discuss how the Tools challenge might guide the development of better physical reasoning agents in AI, as well as better accounts of human physical reasoning and tool use.


Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

arXiv.org Machine Learning

Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our extensive evaluation includes: a set of Atari games, the BipedalWalker domain, and a mini version of the recently proposed multi-agent Pommerman game. Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.


Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts

arXiv.org Artificial Intelligence

This paper tackles the problem of learning a questioner in the goal-oriented visual dialog task. Several previous works adopt model-free reinforcement learning. Most pretrain the model from a finite set of human-generated data. We argue that using limited demonstrations to kick-start the questioner is insufficient due to the large policy search space. Inspired by a recently proposed information theoretic approach, we develop two analytic experts to serve as a source of high-quality demonstrations for imitation learning. We then take advantage of reinforcement learning to refine the model towards the goal-oriented objective. Experimental results on the GuessWhat?! dataset show that our method has the combined merits of imitation and reinforcement learning, achieving the state-of-the-art performance.


Fairness in Reinforcement Learning

arXiv.org Artificial Intelligence

Decision support systems (e.g., for ecological conservation) and autonomous systems (e.g., adaptive controllers in smart cities) start to be deployed in real applications. Although their operations often impact many users or stakeholders, no fairness consideration is generally taken into account in their design, which could lead to completely unfair outcomes for some users or stakeholders. To tackle this issue, we advocate for the use of social welfare functions that encode fairness and present this general novel problem in the context of (deep) reinforcement learning, although it could possibly be extended to other machine learning tasks.


Efficient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy

arXiv.org Artificial Intelligence

This paper proposes a method for learning a trajectory-conditioned policy to imitate diverse demonstrations from the agent's own past experiences. We demonstrate that such self-imitation drives exploration in diverse directions and increases the chance of finding a globally optimal solution in reinforcement learning problems, especially when the reward is sparse and deceptive. Our method significantly outperforms existing self-imitation learning and count-based exploration methods on various sparse-reward reinforcement learning tasks with local optima. In particular, we report a state-of-the-art score of more than 25,000 points on Montezuma's Revenge without using expert demonstrations or resetting to arbitrary states.


Variance Reduction in Actor Critic Methods (ACM)

arXiv.org Machine Learning

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the $L^2$ norm for the control variate estimators spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.


A Quick Guide to Reinforcement Learning

#artificialintelligence

General Electric is the 31st largest company in the world by revenue and one of the largest and most diverse manufacturers on the planet, making everything from large industrial equipment to home appliances. It has over 500 factories around the world and has only begun transforming them into smart facilities. The goal of GE's'Brilliant Manufacturing Suite' is to link design, engineering, manufacturing, supply chain, distribution and services into one globally scalable, intelligent system. It is powered by Predix, their industrial internet of things platform. In the manufacturing space, Predix can use sensors to automatically capture every step of the process and monitor each piece of complex equipment.


How quickly can AI solve a Rubik's Cube? In less time than it took you to read this headline.

#artificialintelligence

Few things reveal the limits of someone's problem-solving skills faster than a Rubik's Cube, the multicolored, three-dimensional puzzle that has befuddled so many since the 1970s. Though the cube has furrowed countless human brows over the years, it's not much of a challenge for an emerging group of hyper-intelligent machines, as it turns out. This week, the University of California at Irvine announced that an artificial intelligence system solved the puzzle in just over a second, besting the current human world record by more than two seconds. The system, known as DeepCubeA -- a reinforcement-learning algorithm programmed by UCI computer scientists and mathematicians -- solved the puzzle without prior knowledge of the game or coaching from its human handlers, according to the university. The feat is even more impressive considering that there are billions of potential moves available to a Rubik's Cube player, with the puzzle's six sides and nine sections, but only one goal: each of the cube's six sides displaying a solid color.