What Is Constrained Reinforcement Learning And How Can One Build Systems Around It


One of the most important innovations in the present era for the development of highly-advanced AI systems has been the introduction of Reinforcement Learning (RL). It has the potential to solve complex decision-making problems. It generally follows a "trial and error" method to learn optimal policies of a given problem. It has been used to achieve superhuman performance in competitive strategy games, including Go, Starcraft, Dota, among others. Despite the promise shown by reinforcement algorithms in many decision-making problems, there are few glitches and challenges, which still need to be addressed.

OpenAI's Procgen Benchmark prevents AI model overfitting


Where the training of machine learning models is concerned, there's always a risk of overfitting -- or corresponding to closely -- to a particular set of data. In point of fact, it's not infeasible that popular machine learning benchmarks like the Arcade Learning Environment encourage overfitting, in that they have a low emphasis on generalization. That's why OpenAI -- the San Francisco-based research firm cofounded by CTO Greg Brockman, chief scientist Ilya Sutskever, and others -- today released the Procgen Benchmark, a set of 16 procedurally-generated environments (CoinRun, StarPilot, CaveFlyer, Dodgeball, FruitBot, Chaser, Miner, Jumper, Leaper, Maze, BigFish, Heist, Climber, Plunder, Ninja, and BossFight) that measure how quickly a model learns generalizable skills. It builds atop the startup's CoinRun toolset, which used procedural generation to construct sets of training and test levels. "We want the best of both worlds: a benchmark comprised of many diverse environments, each of which fundamentally requires generalization," wrote OpenAI in a blog post.

DeepMind gets good at games (and choosing them) – plus more bits and bytes from the world of machine learning


Roundup If you can't get enough of machine learning news then here's a roundup of extra tidbits to keep your addiction ticking away. Read on to learn more about how DeepMind is helping Google's Play Store, and a new virtual environment to train agents safely from OpenAI. An AI recommendation system for the Google Play Store: Deepmind are helping Android users find new apps in the Google Play Store with the help of machine learning. "We started collaborating with the Play store to help develop and improve systems that determine the relevance of an app with respect to the user," the London-based lab said this week. Engineers built a model known as a candidate generator.

Why Playing Hide-and-Seek Could Lead AI to Humanlike Intelligence


Humans are a species that can adapt to environmental challenges, and over eons this has enabled us to biologically evolve -- an essential characteristic found in animals but absent in AI. Although machine learning has made remarkable progress in complex games such as Go and Dota 2, the skills mastered in these arenas do not necessarily generalize to practical applications in real-world scenarios. The goal for a growing number of researchers is to build a machine intelligence that behaves, learns and evolves more like humans. A new paper from San Francisco-based OpenAI proposes that training models in the children's game of hide-and-seek and pitting them against each other in tens of millions of contests results in the models automatically developing humanlike behaviors that increase their intelligence and improve subsequent performance. Hide-and-seek was selected as a fun starting point mostly due to its simple rules, says the paper's first author, OpenAI Researcher Bowen Baker.

Challenges of Real-World Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real world problems. We also present an example domain that has been modified to present these challenges as a testbed for practical RL research.