Reinforcement Learning
Gaussian Processes for Data-Efficient Learning in Robotics and Control
Deisenroth, Marc Peter, Fox, Dieter, Rasmussen, Carl Edward
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this article, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.
[N] NIPS 2017 Workshop Call for Papers -- Hierarchical Reinforcement Learning • r/MachineLearning
We invite all researchers to submit their manuscripts for review. Please address questions to: hrlnips2017@gmail.com Reinforcement Learning (RL) has become a powerful tool for tackling complex sequential decision-making problems as demonstrated in high-dimensional robotics or game-playing domains. Nevertheless, modern RL methods have considerable difficulties when facing sparse rewards, long planning horizons, and more generally a scarcity of useful supervision signals. Hierarchical Reinforcement Learning (HRL) is emerging as a key component for finding spatio-temporal abstractions and behavioral patterns that can guide the discovery of useful large-scale control architectures, both for deep-network representations and for analytic and optimal-control methods.
Reinforcement Learning w/ Keras OpenAI: Actor-Critic Models
Last time in our Keras/OpenAI tutorial, we discussed a very fundamental algorithm in reinforcement learning: the DQN. The Deep Q-Network is actually a fairly new advent that arrived on the seen only a couple years back, so it is quite incredible if you were able to understand and implement this algorithm having just gotten a start in the field. As with the original post, let's take a quick moment to appreciate how incredible results we achieved are: in a continuous output space scenario and starting with absolutely no knowledge on what "winning" entails, we were able to explore our environment and "complete" the trials. Put yourself in the situation of this simulation. This would essentially be like asking you to play a game, without a rulebook or specific endgoal, and demanding you to continue to play until you win (almost seems a bit cruel).
Dynamic Pricing for the crypto.ticket System – crypto.tickets
Crypto.tickets will support dynamic and differentiated pricing. It is a service based on artificial intelligence algorithms, segmentation, and historical data that can increase profits from ticket sales. The pricing model utilizes deep reinforcement learning algorithms when offering ticket prices to maximize event profitability taking into account current situation and sales dynamics. More specifically, the system uses deep reinforcement learning (Q-learning, where Q is a function of profit prediction from ticket sales, depending on the selected pricing strategy) adopted for continuous states and actions. The specific structure of the neural network or the learning specifics is a trade secret. The block diagram of the common usage of the pricing model is shown below.
Elon Musk's Research Venture Has Trained AI To Teach Itself
As part of its effort to find better ways to develop and train "safe artificial general intelligence," OpenAI has been releasing its own versions of reinforcement learning algorithms. They call these OpenAI Baselines, and the most recent additions to these algorithms are two baselines that are meant to enhance machine learning performance by making it more efficient. The first is a baseline implementation called Actor Critic using Kronecker-factored Trust Region (ACKTR). Developed by researchers from the University of Toronto (UofT) and New York University (NYU), ACKTR improves on the way AI policies perform deep reinforcement learning -- learning that is accomplished only by trial and error, and obtained only through raw observation. In a paper published online, the UofT and NYU researchers used simulated robots and Atari games to test how ACKTR learns control policies.
Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence
Anderson, Monica (University of Alabama) | Barták, Roman (Charles University) | Brownstein, John S. (Boston Children's Hospital, Harvard University) | Buckeridge, David L. (McGill University) | Eldardiry, Hoda (Palo Alto Research Center) | Geib, Christopher (Drexel University) | Gini, Maria (University of Minnesota) | Isaksen, Aaron (New York University) | Keren, Sarah (Technion University) | Laddaga, Robert (Vanderbilt University) | Lisy, Viliam (Czech Technical University) | Martin, Rodney (NASA Ames Research Center) | Martinez, David R. (MIT Lincoln Laboratory) | Michalowski, Martin (University of Ottawa) | Michael, Loizos (Open University of Cyprus) | Mirsky, Reuth (Ben-Gurion University) | Nguyen, Thanh (University of Michigan) | Paul, Michael J. (University of Colorado Boulder) | Pontelli, Enrico (New Mexico State University) | Sanner, Scott (University of Toronto) | Shaban-Nejad, Arash (University of Tennessee) | Sinha, Arunesh (University of Michigan) | Sohrabi, Shirin (IBM T. J. Watson Research Center) | Sricharan, Kumar (Palo Alto Research Center) | Srivastava, Biplav (IBM T. J. Watson Research Center) | Stefik, Mark (Palo Alto Research Center) | Streilein, William W. (MIT Lincoln Laboratory) | Sturtevant, Nathan (University of Denver) | Talamadupula, Kartik (IBM T. J. Watson Research Center) | Thielscher, Michael (University of New South Wales) | Togelius, Julian (New York University) | Tran, So Cao (New Mexico State University) | Tran-Thanh, Long (University of Southampton) | Wagner, Neal (MIT Lincoln Laboratory) | Wallace, Byron C. (Northeastern University) | Wilk, Szymon (Poznan University of Technology) | Zhu, Jichen (Drexel University)
Deep learning and machine learning tailored toward a specific Next to convex optimization, contributed were hot topics, and the workshop application. It is now recognized that papers addressed the problems included papers from across the globe formal languages, and their symbolic of symbolic stochastic planning on deep reinforcement learning agents underpinnings, can enable descriptive and shortest path problems.
A New AI Evaluation Cosmos: Ready to Play the Game?
Hérnandez-Orallo, José (Universitat Politècnica de València) | Baroni, Marco (Facebook) | Bieger, Jordi (Reykjavik University) | Chmait, Nader (Monash University) | Dowe, David L. (Monash University) | Hofmann, Katja (Microsoft Research) | Martínez-Plumed, Fernando (Universitat Politècnica de València) | Strannegård, Claes (Chalmers University of Technology) | Thórisson, Kristinn R. (Reykjavik Universit)
We report on a series of new platforms and events dealing with AI evaluation that may change the way in which AI systems are compared and their progress is measured. The introduction of a more diverse and challenging set of tasks in these platforms can feed AI research in the years to come, shaping the notion of success and the directions of the field. However, the playground of tasks and challenges presented there may misdirect the field without some meaningful structure and systematic guidelines for its organization and use. Anticipating this issue, we also report on several initiatives and workshops that are putting the focus on analyzing the similarity and dependencies between tasks, their difficulty, what capabilities they really measure and – ultimately – on elaborating new concepts and tools that can arrange tasks and benchmarks into a meaningful taxonomy.
An Analysis of Model-Based Heuristic Search Techniques for StarCraft Combat Scenarios
Churchill, David (Memorial University) | Lin, Zeming (Facebook AI Research) | Synnaeve, Gabriel (Facebook AI Research)
Real-Time Strategy games have become a popular test-bed for modern AI system due to their real-time computational constraints, complex multi-unit control problems, and imperfect information. One of the most important aspects of any RTS AI system is the efficient control of units in complex combat scenarios, also known as micromanagement. Recently, a model-based heuristic search technique called Portfolio Greedy Search (PGS) has shown promisingpaper we present the first integration of PGS into the StarCraft game engine, and compare its performance to the current state-of-the-art deep reinforcement learning method in several benchmark combat scenarios. We then perform theperformance for providing real-time decision making in RTS combat scenarios, but has so far only been tested in SparCraft: an RTS combat simulator. In this same experiments within the SparCraft simulator in order to investigate any differences between PGS performance in the simulator and in the actual game. Lastly, we investigate how varying parameters of the SparCraft simulator affect the performance of PGS in the StarCraft game engine. We demonstrate that the performance of PGS relies heavily on the accuracy of the underlying model, outperforming other techniques only for scenarios where the SparCraft simulation model more accurately matches the StarCraft game engine.
STARDATA: A StarCraft AI Research Dataset
Lin, Zeming (Facebook) | Gehring, Jonas (Facebook) | Khalidov, Vasil (Facebook) | Synnaeve, Gabriel (Facebook)
We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imitation learning, forward modeling, partial information extraction, and others. We use TorchCraft to extract and store the data, which standardizes the data format for both reading from replays and reading directly from the game. Furthermore, the data can be used on different operating systems and platforms. The dataset contains valid, non-corrupted replays only and its quality and diversity was ensured by a number of heuristics. We illustrate the diversity of the data with various statistics and provide examples of tasks that benefit from the dataset.
SerpentAI/SerpentAI
Serpent.AI is a simple yet powerful, novel framework to assist developers in the creation of game agents. The framework's raison d'être is first and foremost to provide a valuable tool for Machine Learning & AI research. It also turns out to be ridiculously fun to use as a hobbyist (and dangerously addictive; a fair warning)! The framework features a large assortment of supporting modules that provide solutions to commonly encountered scenarios when using video games as environments as well as CLI tools to accelerate development. It provides some useful conventions but is absolutely NOT opiniated about what you put in your agents: Want to use the latest, cutting-edge deep reinforcement learning algorithm?