AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Deep reinforcement learning from human preferences

Christiano, Paul, Leike, Jan, Brown, Tom B., Martic, Miljan, Legg, Shane, Amodei, Dario

arXiv.org Machine LearningJul-13-2017

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1706.03741

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

Omidshafiei, Shayegan, Pazis, Jason, Amato, Christopher, How, Jonathan P., Vian, John

arXiv.org Artificial IntelligenceJul-13-2017

Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrently-exploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to learn and store distinct policies for each task, but in practice identities of tasks are often non-observable, making these approaches inapplicable. This paper formalizes and addresses the problem of multi-task multi-agent reinforcement learning under partial observability. We introduce a decentralized single-task learning approach that is robust to concurrent interactions of teammates, and present an approach for distilling single-task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1703.06182

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

DeepMind's AI is teaching itself parkour, and the results are adorable

#artificialintelligenceJul-12-2017, 13:00:19 GMT

Keeping up with the latest AI research can be an odd experience. On the one hand, you're aware that you're looking at cutting-edge experimentation, with new papers outlining the ideas and methods that will probably (eventually) snowball into the biggest technological revolution of all time. On the other hand, sometimes what you're looking at is just unavoidably weird and funny. Case in point is a new paper from Google's AI subsidiary DeepMind titled "Emergence of Locomotion Behaviours in Rich Environments." The research explores how reinforcement learning (or RL) can be used to teach a computer to navigate unfamiliar and complex environments.

large language model, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Google's DeepMind uses reinforcement learning to master parkour

#artificialintelligenceJul-10-2017, 23:35:18 GMT

Google has taught its DeepMind AI to navigate a parkour course by using reinforcement learning. Reinforcement learning is the practice of rewarding desirable behaviour. The faster the AI could navigate the virtual parkour course, the greater the reward. Further incentives and penalties were added for various other metrics. "We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress," explains Nicolas Heess, a researcher on the project.

large language model, machine learning, reinforcement learning, (9 more...)

#artificialintelligence

Genre:

Instructional Material > Online (0.32)
Instructional Material > Course Syllabus & Notes (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Lee, Alex X., Levine, Sergey, Abbeel, Pieter

arXiv.org Artificial IntelligenceJul-10-2017

Visual servoing involves choosing actions that move a robot in response to observations from a camera, in order to reach a goal configuration in the world. Standard visual servoing approaches typically rely on manually designed features and analytical dynamics models, which limits their generalization capability and often requires extensive application-specific feature and model engineering. In this work, we study how learned visual features, learned predictive dynamics models, and reinforcement learning can be combined to learn visual servoing mechanisms. We focus on target following, with the goal of designing algorithms that can learn a visual servo using low amounts of data of the target in question, to enable quick adaptation to new targets. Our approach is based on servoing the camera in the space of learned visual features, rather than image pixels or manually-designed keypoints. We demonstrate that standard deep features, in our case taken from a model trained for object classification, can be used together with a bilinear predictive model to learn an effective visual servo that is robust to visual variation, changes in viewing angle and appearance, and occlusions. A key component of our approach is to use a sample-efficient fitted Q-iteration algorithm to learn which features are best suited for the task at hand. We show that we can learn an effective visual servo on a complex synthetic car following benchmark using just 20 training trajectory samples for reinforcement learning. We demonstrate substantial improvement over a conventional approach based on image pixels or hand-designed keypoints, and we show an improvement in sample-efficiency of more than two orders of magnitude over standard model-free deep reinforcement learning algorithms.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

1703.11

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Unifying task specification in reinforcement learning

White, Martha

arXiv.org Artificial IntelligenceJul-7-2017

Reinforcement learning tasks are typically specified as Markov decision processes. This formalism has been highly successful, though specifications often couple the dynamics of the environment and the learning objective. This lack of modularity can complicate generalization of the task specification, as well as obfuscate connections between different task settings, such as episodic and continuing. In this work, we introduce the RL task formalism, that provides a unification through simple constructs including a generalization to transition-based discounting. Through a series of examples, we demonstrate the generality and utility of this formalism. Finally, we extend standard learning constructs, including Bellman operators, and extend some seminal theoretical results, including approximation errors bounds. Overall, we provide a well-understood and sound formalism on which to build theoretical results and simplify algorithm use and development.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1609.01995

Country: North America (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Industrial AI Podcast – Bonsai – Medium

#artificialintelligenceJul-6-2017, 00:35:17 GMT

Check out Episode 1 below and download our latest paper exploring the unique challenges and requirements of Industrial AI. In Part 3 of TWIML's Industrial AI series, Sam Charrington digs into robotics and reinforcement learning with Berkeley PhD student, Chelsea Finn. This talk gets into some of the technical weeds of cutting-edge robotics technologies, including inverse reinforcement learning, meta learning and the benefits and challenges of training robots in simulations. Chelsea also talks about what it's like pursuing a PhD in machine learning and how to keep up with such a rapidly advancing field. Check out the full conversation with Chelsea below.

industrial ai, machine learning, reinforcement learning, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hashing Over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning

Yin, Haiyan, Pan, Sinno Jialin

arXiv.org Machine LearningJul-3-2017

In reinforcement learning (RL) tasks, an efficient exploration mechanism should be able to encourage an agent to take actions that lead to less frequent states which may yield higher accumulative future return. However, both knowing about the future and evaluating the frequentness of states are non-trivial tasks, especially for deep RL domains, where a state is represented by high-dimensional image frames. In this paper, we propose a novel informed exploration framework for deep RL tasks, where we build the capability for a RL agent to predict over the future transitions and evaluate the frequentness for the predicted future frames in a meaningful manner. To this end, we train a deep prediction model to generate future frames given a state-action pair, and a convolutional autoencoder model to generate deep features for conducting hashing over the seen frames. In addition, to utilize the counts derived from the seen frames to evaluate the frequentness for the predicted frames, we tackle the challenge of making the hash codes for the predicted future frames to match with their corresponding seen frames. In this way, we could derive a reliable metric for evaluating the novelty of the future direction pointed by each action, and hence inform the agent to explore the least frequent one. We use Atari 2600 games as the testing environment and demonstrate that the proposed framework achieves significant performance gain over a state-of-the-art informed exploration approach in most of the domains.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1707.00524

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DALI 2017 – Workshop – Data Efficient Reinforcement Learning

#artificialintelligenceJul-2-2017, 19:00:13 GMT

With data collection on the rise, machine learning is a hot topic. The manner in which computers are able to mimic human thinking is rapidly exceeding human capabilities in everything from chess to picking the winner of a song contest.

artificial intelligence, data efficient reinforcement learning, machine learning, (1 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Chess (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Variance Regularizing Adversarial Learning

Grewal, Karan, Hjelm, R Devon, Bengio, Yoshua

arXiv.org Machine LearningJul-2-2017

We introduce a novel approach for training adversarial models by replacing the discriminator score with a bi-modal Gaussian distribution over the real/fake indicator variables. In order to do this, we train the Gaussian classifier to match the target bi-modal distribution implicitly through meta-adversarial training. We hypothesize that this approach ensures a non-zero gradient to the generator, even in the limit of a perfect classifier. We test our method against standard benchmark image datasets as well as show the classifier output distribution is smooth and has overlap between the real and fake modes.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1707.00309

Country:

North America > Canada > Quebec > Montreal (0.15)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)

Add feedback