AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Towards Physically Safe Reinforcement Learning under Supervision

Zhang, Yinan, Balkcom, Devin, Li, Haoxiang

arXiv.org Machine LearningJan-19-2019

This paper addresses the question of how a previously available control policy $\pi_s$ can be used as a supervisor to more quickly and safely train a new learned control policy $\pi_L$ for a robot. A weighted average of the supervisor and learned policies is used during trials, with a heavier weight initially on the supervisor, in order to allow safe and useful physical trials while the learned policy is still ineffective. During the process, the weight is adjusted to favor the learned policy. As weights are adjusted, the learned network must compensate so as to give safe and reasonable outputs under the different weights. A pioneer network is introduced that pre-learns a policy that performs similarly to the current learned policy under the planned next step for new weights; this pioneer network then replaces the currently learned network in the next set of trials. Experiments in OpenAI Gym demonstrate the effectiveness of the proposed method.

learning, pioneer network, supervisor, (15 more...)

arXiv.org Machine Learning

1901.06576

Country:

North America > United States > New Hampshire > Grafton County > Hanover (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reinforcement Learning: The Power of Big Data in Call Centers

#artificialintelligenceJan-18-2019, 20:02:47 GMT

If you're deeply involved in the study of artificial intelligence or automated predictive modeling, you may have come across the term "reinforcement learning," or mapping situations to actions to maximize some type of numerical reward signal. For humans, this process occurs naturally as we grow and experiment with our surroundings and see how our actions influence our rewards. Reinforcement learning deviates greatly from the normal means by which artificial intelligences are typically programmed. As noted in the book Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto, "the most important feature distinguishing reinforcement learning from other types of learning is that it uses training information that evaluates the action taken rather than instructs by giving correct actions." In short, reinforcement learning "teaches" machines how to learn from past experience and exploit that information to maximize a reward.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Ten Machine Learning Algorithms You Should Know to Become a Data Scientist - ParallelDots

#artificialintelligenceJan-18-2019, 13:07:26 GMT

Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.

machine learning, natural language, reinforcement learning, (17 more...)

#artificialintelligence

Country: North America > United States (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Liu, Boyi, Wang, Lujia, Liu, Ming, Xu, Chengzhong

arXiv.org Artificial IntelligenceJan-18-2019

This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRLA). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRLA are introduced. LFRLA is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRLA greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRLA is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRLA.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

1901.06455

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology (0.68)
Leisure & Entertainment > Games (0.32)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

On-Policy Trust Region Policy Optimisation with Replay Buffers

Kangin, Dmitry, Pugeault, Nicolas

arXiv.org Machine LearningJan-18-2019

Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies. On-policy methods bring many benefits, such as ability to evaluate each resulting policy. However, they usually discard all the information about the policies which existed before. In this work, we propose adaptation of the replay buffer concept, borrowed from the off-policy learning setting, to create the method, combining advantages of on- and off-policy learning. To achieve this, the proposed algorithm generalises the $Q$-, value and advantage functions for data from multiple policies. The method uses trust region optimisation, while avoiding some of the common problems of the algorithms such as TRPO or ACKTR: it uses hyperparameters to replace the trust region selection heuristics, as well as the trainable covariance matrix instead of the fixed one. In many cases, the method not only improves the results comparing to the state-of-the-art trust region on-policy learning algorithms such as PPO, ACKTR and TRPO, but also with respect to their off-policy counterpart DDPG.

covariance matrix, reinforcement, replay buffer, (13 more...)

arXiv.org Machine Learning

1901.06212

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Devon > Exeter (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A robot dog has learned to run faster with machine learning

#artificialintelligenceJan-17-2019, 01:32:27 GMT

Reinforcement learning has helped a four-legged bot move a bit like a real animal, without having to be taught how to make each step. The news: Roboticists want their creations to mimic animals because animals invariably move in the most energy-efficient way. But the eerily lifelike movement of robots like Boston Dynamics' Spotmini is usually coded by hand. Now researchers have combined simulation with a technique called reinforcement learning to teach a dog-like robot called "ANYmal" to run faster and recover from falls. Crucially, it did so without any manual intervention.

machine learning, reinforcement learning, robot dog, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Multi-agent Reinforcement Learning Embedded Game for the Optimization of Building Energy Control and Power System Planning

Hao, Jun

arXiv.org Machine LearningJan-17-2019

Most of the current game-theoretic demand-side management methods focus primarily on the scheduling of home appliances, and the related numerical experiments are analyzed under various scenarios to achieve the corresponding Nash-equilibrium (NE) and optimal results. However, not much work is conducted for academic or commercial buildings. The methods for optimizing academic-buildings are distinct from the optimal methods for home appliances. In my study, we address a novel methodology to control the operation of heating, ventilation, and air conditioning system (HVAC). With the development of Artificial Intelligence and computer technologies, reinforcement learning (RL) can be implemented in multiple realistic scenarios and help people to solve thousands of real-world problems. Reinforcement Learning, which is considered as the art of future AI, builds the bridge between agents and environments through Markov Decision Chain or Neural Network and has seldom been used in power system. The art of RL is that once the simulator for a specific environment is built, the algorithm can keep learning from the environment. Therefore, RL is capable of dealing with constantly changing simulator inputs such as power demand, the condition of power system and outdoor temperature, etc. Compared with the existing distribution power system planning mechanisms and the related game theoretical methodologies, our proposed algorithm can plan and optimize the hourly energy usage, and have the ability to corporate with even shorter time window if needed.

artificial system, renewable energy, upstream oil & gas, (23 more...)

arXiv.org Machine Learning

1901.07333

Country:

North America > United States (1.00)
Europe > United Kingdom (0.14)
Asia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.67)

Industry:

Machinery > Industrial Machinery (1.00)
Construction & Engineering > HVAC (1.00)
Banking & Finance > Trading (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Representation Learning on Graphs: A Reinforcement Learning Application

Madjiheurem, Sephora, Toni, Laura

arXiv.org Machine LearningJan-17-2019

In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithm on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and the Variational Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space.

basis function, graph, value function, (15 more...)

arXiv.org Machine Learning

1901.05351

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Recurrent Control Nets for Deep Reinforcement Learning

Liu, Vincent, Adeniji, Ademi, Lee, Nathaniel, Zhao, Jason, Srouji, Mario

arXiv.org Machine LearningJan-17-2019

Central Pattern Generators (CPGs) are biological neural circuits capable of producing coordinated rhythmic outputs in the absence of rhythmic input. As a result, they are responsible for most rhythmic motion in living organisms. This rhythmic control is broadly applicable to fields such as locomotive robotics and medical devices. In this paper, we explore the possibility of creating a self-sustaining CPG network for reinforcement learning that learns rhythmic motion more efficiently and across more general environments than the current multilayer perceptron (MLP) baseline models. Recent work introduces the Structured Control Net (SCN), which maintains linear and nonlinear modules for local and global control, respectively. Here, we show that time-sequence architectures such as Recurrent Neural Networks (RNNs) model CPGs effectively. Combining previous work with RNNs and SCNs, we introduce the Recurrent Control Net (RCN), which adds a linear component to the, RCNs match and exceed the performance of baseline MLPs and SCNs across all environment tasks. Our findings confirm existing intuitions for RNNs on reinforcement learning tasks, and demonstrate promise of SCN-like structures in reinforcement learning.

architecture, recurrent control, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1901.01994

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Theory of Minds: Understanding Behavior in Groups Through Inverse Planning

Shum, Michael, Kleiman-Weiner, Max, Littman, Michael L., Tenenbaum, Joshua B.

arXiv.org Artificial IntelligenceJan-17-2019

Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about the latent relationships that underlie behavior from just sparse and noisy observations. Rapid and accurate inferences are important for determining who to cooperate with, who to compete with, and how to cooperate in order to compete. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multi-agent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH). This representation is grounded in the formalism of stochastic games and multi-agent reinforcement learning. We use CTH as a target for Bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together. Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations. The patterns of inference made by this algorithm closely correspond with human judgments and the algorithm makes the same rapid generalizations that people do.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1901.06085

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback