AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Where to Look Next: Unsupervised Active Visual Exploration on 360{\deg} Input

Seifi, Soroush, Tuytelaars, Tinne

arXiv.org Artificial IntelligenceSep-23-2019

We address the problem of active visual exploration of large 360{\deg} inputs. In our setting an active agent with a limited camera bandwidth explores its 360{\deg} environment by changing its viewing direction at limited discrete time steps. As such, it observes the world as a sequence of narrow field-of-view 'glimpses', deciding for itself where to look next. Our proposed method exceeds previous works' performance by a significant margin without the need for deep reinforcement learning or training separate networks as sidekicks. A key component of our system are the spatial memory maps that make the system aware of the glimpses' orientations (locations in the 360{\deg} image). Further, we stress the advantages of retina-like glimpses when the agent's sensor bandwidth and time-steps are limited. Finally, we use our trained model to do classification of the whole scene using only the information observed in the glimpses.

architecture, module, reconstruction, (12 more...)

arXiv.org Artificial Intelligence

1909.10304

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

Hands-On Meta Learning with Python: Meta learning using one-shot learning, MAML, Reptile, and Meta-SGD with TensorFlow: Sudharsan Ravichandiran: 9781789534207: Amazon.com: Books

#artificialintelligenceSep-22-2019, 22:55:48 GMT

Sudharsan Ravichandiran is a data scientist, researcher, artificial intelligence enthusiast, and YouTuber (search for Sudharsan reinforcement learning). He completed his bachelors in information technology at Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning, which includes natural language processing and computer vision. He is an open source contributor and loves answering questions on Stack Overflow. He also authored a best seller, Hands-On Reinforcement Learning with Python, published by Packt Publishing.

hand-on meta learning, one-shot learning, sudharsan ravichandiran, (6 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

deepmind/bsuite

#artificialintelligenceSep-22-2019, 21:08:58 GMT

This library automates evaluation and analysis of any agent on these benchmarks. It serves to facilitate reproducible, and accessible, research on the core issues in RL, and ultimately the design of superior learning algorithms. Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of the experiments from a committee of prominent researchers. For a more comprehensive overview, see the accompanying paper. This means any experiment will automatically output data in the correct format for analysis using the notebook, without any constraints on the structure of agents or algorithms.

agent, bsuite, experiment, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

An introduction to reinforcement learning with AWS RoboMaker Amazon Web Services

#artificialintelligenceSep-22-2019, 10:48:18 GMT

Robotics often involves training complex sequences of behaviors. For example, consider a robot designed to follow or track another object. Although the goal is easy to describe (the closer the robot is to the object, the better), creating the logic that accomplishes the task is much more difficult. Reinforcement learning (RL), an emerging Machine Learning technique, can help develop solutions for exactly these kinds of problems. This post is an introduction to RL and it explains how we used AWS RoboMaker to develop an application that trains a TurtleBot Waffle Pi to track and move toward a TurtleBot Burger.

application, stationary turtlebot burger, turtlebot waffle pi, (10 more...)

#artificialintelligence

Country: North America > United States (0.05)

Industry:

Information Technology > Services (0.41)
Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Faster saddle-point optimization for solving large-scale Markov decision processes

Bas-Serrano, Joan, Neu, Gergely

arXiv.org Machine LearningSep-22-2019

We consider the problem of computing optimal policies in average-r eward Markov decision processes. This classical problem can be formulated as a linear program dire ctly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the n umber of states. T o address this issue, recent work has considered a linearly relaxed version of the res ulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization pro blem by characterizing the conditions necessary for convergence to the optimal policy, and designing a n optimization algorithm enjoying fast convergence rates that are independent of the size of the state s pace.

algorithm, assumption, saddle-point problem, (15 more...)

arXiv.org Machine Learning

1909.10904

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning

Ribeiro, João, Melo, Francisco S., Dias, João

arXiv.org Artificial IntelligenceSep-22-2019

In this paper we investigate two hypothesis regarding the use of deep reinforcement learning in multiple tasks. The first hypothesis is driven by the question of whether a deep reinforcement learning algorithm, trained on two similar tasks, is able to outperform two single-task, individually trained algorithms, by more efficiently learning a new, similar task, that none of the three algorithms has encountered before. The second hypothesis is driven by the question of whether the same multi-task deep RL algorithm, trained on two similar tasks and augmented with elastic weight consolidation (EWC), is able to retain similar performance on the new task, as a similar algorithm without EWC, whilst being able to overcome catastrophic forgetting in the two previous tasks. We show that a multi-task Asynchronous Advantage Actor-Critic (GA3C) algorithm, trained on Space Invaders and Demon Attack, is in fact able to outperform two single-tasks GA3C versions, trained individually for each single-task, when evaluated on a new, third task, namely, Phoenix. We also show that, when training two trained multi-task GA3C algorithms on the third task, if one is augmented with EWC, it is not only able to achieve similar performance on the new task, but also capable of overcoming a substantial amount of catastrophic forgetting on the two previous tasks.

agent, algorithm, learning, (10 more...)

arXiv.org Artificial Intelligence

1909.10008

Country: Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

AI Learns to Cheat at Hide and Seek #OpenAI #HideandSeek #MachineLearning #ArtificialIntelligence #ReinforcementLearning @OpenAI

#artificialintelligenceSep-21-2019, 02:19:12 GMT

OpenAI recently posted on Twitter about teaching computer agents'hide and seek'. We've observed AIs discovering complex tool use while competing in a simple game of hide-and-seek. They develop a series of six distinct strategies and counter strategies, ultimately using tools in the environment to break our simulated physics. In the simulations, seekers are incentivized to maintain line of sight of hiders and hiders are incentivized to avoid line of sight from seekers. The agents environments contain various shelters including cubicles, movable partitions, blocks and ramps. That said, there is no built-in incentive for agents to interact with objects around them.

agent, artificialintelligence, openai, (10 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.38)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Scaled Autonomy: Enabling Human Operators to Control Robot Fleets

Swamy, Gokul, Reddy, Siddharth, Levine, Sergey, Dragan, Anca D.

arXiv.org Machine LearningSep-21-2019

Autonomous robots often encounter challenging situations where their control policies fail and an expert human operator must briefly intervene, e.g., through teleoperation. In settings where multiple robots act in separate environments, a single human operator can manage a fleet of robots by identifying and teleoperating one robot at any given time. The key challenge is that users have limited attention: as the number of robots increases, users lose the ability to decide which robot requires teleoperation the most. Our goal is to automate this decision, thereby enabling users to supervise more robots than their attention would normally allow for. Our insight is that we can model the user's choice of which robot to control as an approximately optimal decision that maximizes the user's utility function. We learn a model of the user's preferences from observations of the user's choices in easy settings with a few robots, and use it in challenging settings with more robots to automatically identify which robot the user would most likely choose to control, if they were able to evaluate the states of all robots at all times. We run simulation experiments and a user study with twelve participants that show our method can be used to assist users in performing a navigation task and manipulator reaching task.

demonstration, operator, robot, (17 more...)

arXiv.org Machine Learning

1910.0291

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Greece (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture

Ladosz, Pawel, Ben-Iwhiwhu, Eseoghene, Hu, Yang, Ketz, Nicholas, Kolouri, Soheil, Krichmar, Jeffrey L., Pilly, Praveen, Soltoggio, Andrea

arXiv.org Machine LearningSep-21-2019

This paper introduces the modulated Hebbian plus Q network architecture (MOHQA) for solving challenging partially observable Markov decision processes (POMDPs) deep reinforcement learning problems with sparse rewards and confounding observations. The proposed architecture combines a deep Q-network (DQN), and a modulated Hebbian network with neural eligibility traces (MOHN). Bio-inspired neural traces are used to bridge temporal delays between actions and rewards. The purpose is to discover distal cause-effect relationships where confounding observations and sparse rewards cause standard RL algorithms to fail. Each of the two modules of the network (DQN and MOHN) is responsible for different aspects of learning. DQN learns low level features and control, while MOHN contributes to the high-level decisions by bridging rewards with past actions. The strength of the approach is to support a DQN standard framework when temporal difference errors are difficult to compute due to non-observable states. The system is tested on a set of generalized decision making problems encoded as decision tree graphs that deliver delayed rewards after key decision points and confounding observations. The simulations show that the proposed approach helps solve problems that are currently challenging for state-of-the-art deep reinforcement learning algorithms.

architecture, ct -graph, decision point, (16 more...)

arXiv.org Machine Learning

1909.09902

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > United Kingdom > England > Leicestershire > Loughborough (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Leveraging Human Guidance for Deep Reinforcement Learning Tasks

Zhang, Ruohan, Torabi, Faraz, Guan, Lin, Ballard, Dana H., Stone, Peter

arXiv.org Artificial IntelligenceSep-21-2019

Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations. We review the motivation, assumption, and implementation of each framework. We then discuss possible future research directions.

agent, international conference, reinforcement, (17 more...)

arXiv.org Artificial Intelligence

1909.09906

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Macao (0.04)
Asia > China (0.04)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Games (0.68)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback