Goto

Collaborating Authors

 Reinforcement Learning


A Goal-Based Movement Model for Continuous Multi-Agent Tasks

arXiv.org Machine Learning

Despite increasing attention paid to the need for fast, scalable methods to analyze next-generation neuroscience data, comparatively little attention has been paid to the development of similar methods for behavioral analysis. Just as the volume and complexity of brain data have grown, behavioral paradigms in systems neuroscience have likewise become more naturalistic and less constrained, necessitating an increase in the flexibility and scalability of the models used to study them. In particular, key assumptions made in the analysis of typical decision paradigms --- optimality; analytic tractability; discrete, low-dimensional action spaces --- may be untenable in richer tasks. Here, using the case of a two-player, real-time, continuous strategic game as an example, we show how the use of modern machine learning methods allows us to relax each of these assumptions. Following an inverse reinforcement learning approach, we are able to succinctly characterize the joint distribution over players' actions via a generative model that allows us to simulate realistic game play. We compare simulated play from a number of generative time series models and show that ours successfully resists mode collapse while generating trajectories with the rich variability of real behavior. Together, these methods offer a rich class of models for the analysis of continuous action tasks at the single-trial level.


Projective simulation with generalization

arXiv.org Artificial Intelligence

The ability to act upon a new stimulus, based on previous experience with similar, but distinct, stimuli, sometimes denoted as generalization, is used extensively in our daily life. As a simple example, consider a driver's response to traffic lights: The driver need not recognize the details of a particular traffic light in order to respond to it correctly, even though traffic lights may appear different from one another. The only property that matters is the color, whereas neither shape nor size should play any role in the driver's reaction. Learning how to react to traffic lights thus involves an aspect of generalization. A learning agent, capable of a meaningful and useful generalization is expected to have the following characteristics: (a) an ability for categorization (recognizing that all red signals have a common property, which we can refer to as redness); (b) an ability to classify (a new red object is to be related to the group of objects with the redness property); (c) ideally, only generalizations that are relevant for the success of the agent should be learned (red signals should be treated the same, whereas squareshaped signals should not, as they share no property that is of relevance in this context); (d) correct actions should be associated with relevant generalized properties (the driver should stop whenever a red signal is shown); and (e) the generalization mechanism should be flexible. To illustrate what we mean by "flexible generalization", let us go back to our driver. After learning how to handle traffic lights correctly, the driver tries to follow arrow signs to, say, a nearby airport. Clearly, it is now the shape category of the signal that should guide the driver, rather than the color category.


Why Artificial Intelligence Should Be More Canadian

@machinelearnbot

Canada has produced several big breakthroughs in artificial intelligence in recent years, and its government is keen to establish the country as a global epicenter of AI. The country's prime minister, Justin Trudeau, also hopes that the technology will learn Canadian values as it grows up. Speaking at a major AI event in Toronto today, Trudeau demonstrated an impressive enthusiasm for AI and machine learning, at one point even taking a stab at describing the concept of deep reinforcement learning, an approach that lets computers learn to do complex things that can't be programmed manually (see "10 Breakthrough Technologies 2017: Reinforcement Learning"). Both deep reinforcement learning and deep neural networks, which the method exploits, were pioneered by researchers working at Canadian universities. The country's government is now investing in big efforts to spur more AI research.


Transfer Learning to Learn with Multitask Neural Model Search

arXiv.org Machine Learning

Deep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of grid search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks. In this paper, we present the Multitask Neural Model Search (MNMS) controller. Our goal is to learn a generalizable framework that can condition model construction on successful model searches for previously seen tasks, thus significantly speeding up the search for new tasks. We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task. We then show that pre-trained MNMS controllers can transfer learning to new tasks. By leveraging knowledge from previous searches, we find that pre-trained MNMS models start from a better location in the search space and reduce search time on unseen tasks, while still discovering models that outperform published human-designed models.


Robust and Efficient Transfer Learning with Hidden-Parameter Markov Decision Processes

arXiv.org Machine Learning

We introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings. Our new framework correctly models the joint uncertainty in the latent parameters and the state space. We also replace the original Gaussian Process-based model with a Bayesian Neural Network, enabling more scalable inference. Thus, we expand the scope of the HiP-MDP to applications with higher dimensions and more complex dynamics.


Reinforcement Learning for the Enterprise

@machinelearnbot

This article is featured in the new DZone Guide to Artificial Intelligence. Get your free copy for more insightful articles, industry statistics, and more! Humanity has a unique ability to adapt to dynamic environments and learn from their surroundings and failures. It is something that machines lack, and that is where artificial intelligence seeks to correct this deficiency. However, traditional supervised machine learning techniques require a lot of proper historical data to learn patterns and then act based on them.


C. Karen Liu: Modeling Human Movements for Robotics CMU RI Seminar

Robohub

Abstract: "Creating realistic virtual humans has traditionally been considered a research problem in Computer Animation primarily for entertainment applications. With the recent breakthrough in collaborative robots and deep reinforcement learning, accurately modeling human movements and behaviors has become a common challenge faced by researchers in robotics, artificial intelligence, as well as Computer Animation. In this talk, I will focus on two different yet highly relevant problems: how to teach robots to move like humans and how to teach robots to interact with humans. While Computer Animation research has shown that it is possible to teach a virtual human to mimic human athletes' movements, transferring such complex controllers to robot hardware in the real world is perhaps even more challenging than learning the controllers themselves. In this talk, I will focus on two strategies to transfer highly dynamic skills from character animation to robots: teaching robots basic self-preservation motor skills and developing data-driven algorithms on transfer learning between simulation and the real world. The second part of the talk will focus on robotic assistance with dressing, which is a prominent activities of daily living (ADLs) most commonly requested by older adults. To safely train a robot to physically interact with humans, one can design a generative model of human motion based on prior knowledge or recorded motion data. Although this approach has been successful in Computer Animation, such as generating locomotion, designing procedures for a loosely defined task, such as "being dressed", is likely to be biased to the specific data or assumptions. I will describe a new approach to modeling human motion without being biased toward specific situations presented in the dataset."


Reinforcement Learning Coach by Intel - Intel Nervana

#artificialintelligence

Gal is a research engineer at the Intel Nervana algorithms team. He has a great passion for AI, and specifically for training and implementing Reinforcement Learning agents. He has optimized and trained low precision neural networks, enabling deep learning inference, on various Intel devices. He has been with Intel for 10 years, and before joining Intel Nervana, was mainly focused on power management algorithms optimization for Intel CPUs. In his spare time, Gal enjoys baking sourdough breads, hiking and watching movies with his wife.


Distributional Reinforcement Learning with Quantile Regression

arXiv.org Machine Learning

In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation. Finally, we evaluate this new algorithm on the Atari 2600 games, observing that it significantly outperforms many of the recent improvements on DQN, including the related distributional algorithm C51.


Will scikit-learn utilize GPU?

#artificialintelligence

By default none of both are going to use GPU, especially if it is running inside Docker, unless you use nvidia-docker and an image capable of doing it. Scikit-learn is not intended to be used as a deep-learning framework, and seems that it doesn't support GPU computations. Why is there no support for deep or reinforcement learning / Will there be support for deep or reinforcement learning in scikit-learn? Deep learning and reinforcement learning both require a rich vocabulary to define an architecture, with deep learning additionally requiring GPUs for efficient computing. However, neither of these fit within the design constraints of scikit-learn; as a result, deep learning and reinforcement learning are currently out of scope for what scikit-learn seeks to achieve.