AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Pan, Ling, Cai, Qingpeng, Meng, Qi, Chen, Wei, Huang, Longbo, Liu, Tie-Yan

arXiv.org Artificial IntelligenceMar-14-2019

Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hard' updating scheme results in pure exploitation and may lead to misbehavior due to noise in stochastic environments. Thus, it is critical to balancing exploration and exploitation in value function estimation. The Boltzmann softmax operator has a greater capability in exploring potential action-values. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator in value function estimation, which has good convergence property in the setting of planning and learning. Moreover, we prove that dynamic Boltzmann softmax updates can eliminate the overestimation phenomenon introduced by the hard max operator. Experimental results on GridWorld show that the DBS operator enables convergence and a better trade-off between exploration and exploitation in value function estimation. Finally, we propose the DBS-DQN algorithm by generalizing the dynamic Boltzmann softmax update in deep Q-network, which outperforms DQN substantially in 40 out of 49 Atari games.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

1903.05926

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement learning for the birds

#artificialintelligenceMar-13-2019, 07:01:59 GMT

Check out the in-depth tutorials on reinforcement learning, machine learning, NLP, and more at the Artificial Intelligence Conference in New York City, April 15-18, 2019. Early price ends March 1. I just read a fascinating article about an experiment in bird psychology. We've known for a long time that bird songs aren't innate; they're learned. If you listen carefully to your back yard birds in the spring, you can hear the young birds learning their songs; you'll probably hear a few that can't get it right, and that gradually get better as summer progresses.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Country: North America > United States > New York (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Reinforcement Learning Tutorial Part 3: Basic Deep Q-Learning

#artificialintelligenceMar-13-2019, 03:32:21 GMT

In part 1 we introduced Q-learning as a concept with a pen and paper example. In part 2 we implemented the example in code and demonstrated how to execute it in the cloud. In this third part, we will move our Q-learning approach from a Q-table to a deep neural net. With Q-table, your memory requirement is an array of states x actions. For the state-space of 5 and action-space of 2, the total memory consumption is 2 x 5 10.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

VRKitchen: an Interactive 3D Virtual Environment for Task-oriented Learning

Gao, Xiaofeng, Gong, Ran, Shu, Tianmin, Xie, Xu, Wang, Shu, Zhu, Song-Chun

arXiv.org Artificial IntelligenceMar-13-2019

One of the main challenges of advancing task-oriented learning such as visual task planning and reinforcement learning is the lack of realistic and standardized environments for training and testing AI agents. Previously, researchers often relied on ad-hoc lab environments. There have been recent advances in virtual systems built with 3D physics engines and photo-realistic rendering for indoor and outdoor environments, but the embodied agents in those systems can only conduct simple interactions with the world (e.g., walking around, moving objects, etc.). Most of the existing systems also do not allow human participation in their simulated environments. In this work, we design and implement a virtual reality (VR) system, VRKitchen, with integrated functions which i) enable embodied agents powered by modern AI methods (e.g., planning, reinforcement learning, etc.) to perform complex tasks involving a wide range of fine-grained object manipulations in a realistic environment, and ii) allow human teachers to perform demonstrations to train agents (i.e., learning from demonstration). We also provide standardized evaluation benchmarks and data collection tools to facilitate a broad use in research on task-oriented learning and beyond.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1903.05757

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Sweden > Skåne County > Malmö (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report (0.40)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Machine Learning in IoT Security: Current Solutions and Future Challenges

Hussain, Fatima, Hussain, Rasheed, Hassan, Syed Ali, Hossain, Ekram

arXiv.org Machine LearningMar-13-2019

The future Internet of Things (IoT) will have a deep economical, commercial and social impact on our lives. The participating nodes in IoT networks are usually resource-constrained, which makes them luring targets for cyber attacks. In this regard, extensive efforts have been made to address the security and privacy issues in IoT networks primarily through traditional cryptographic approaches. However, the unique characteristics of IoT nodes render the existing solutions insufficient to encompass the entire security spectrum of the IoT networks. This is, at least in part, because of the resource constraints, heterogeneity, massive real-time data generated by the IoT devices, and the extensively dynamic behavior of the networks. Therefore, Machine Learning (ML) and Deep Learning (DL) techniques, which are able to provide embedded intelligence in the IoT devices and networks, are leveraged to cope with different security problems. In this paper, we systematically review the security requirements, attack vectors, and the current security solutions for the IoT networks. We then shed light on the gaps in these security solutions that call for ML and DL approaches. We also discuss in detail the existing ML and DL solutions for addressing different security problems in IoT networks. At last, based on the detailed investigation of the existing solutions in the literature, we discuss the future research directions for ML- and DL-based IoT security.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1904.05735

Country: North America > Canada (0.92)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.47)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
(5 more...)

Add feedback

Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning

Ota, Kei, Jha, Devesh K., Oiki, Tomoaki, Miura, Mamoru, Nammoto, Takashi, Nikovski, Daniel, Mariyama, Toshisada

arXiv.org Machine LearningMar-13-2019

In this paper, we propose a reinforcement learning-based algorithm for trajectory optimization for constrained dynamical systems. This problem is motivated by the fact that for most robotic systems, the dynamics may not always be known. Generating smooth, dynamically feasible trajectories could be difficult for such systems. Using sampling-based algorithms for motion planning may result in trajectories that are prone to undesirable control jumps. However, they can usually provide a good reference trajectory which a model-free reinforcement learning algorithm can then exploit by limiting the search domain and quickly finding a dynamically smooth trajectory. We use this idea to train a reinforcement learning agent to learn a dynamically smooth trajectory in a curriculum learning setting. Furthermore, for generalization, we parameterize the policies with goal locations, so that the agent can be trained for multiple goals simultaneously. We show result in both simulated environments as well as real experiments, for a $6$-DoF manipulator arm operated in position-controlled mode to validate the proposed idea. We compare the proposed ideas against a PID controller which is used to track a designed trajectory in configuration space. Our experiments show that our RL agent trained with a reference path outperformed a model-free PID controller of the type commonly used on many robotic platforms for trajectory tracking.

machine learning, reinforcement learning, trajectory, (19 more...)

arXiv.org Machine Learning

1903.05751

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

How to apply Reinforcement Learning to real life planning problems

#artificialintelligenceMar-12-2019, 23:43:33 GMT

To avoid the paper being thrown in the bin we provide this with a large, negative reward, say -1, and because the teacher is pleased with it being placed in the bin this nets a large positive reward, 1. To avoid the outcome where it continually gets passed around the room, we set the reward for all other actions to be a small, negative value, say -0.04. If we set this as a positive or null number then the model may let the paper go round and round as it would be better to gain small positives than risk getting close to the negative outcome. This number is also very small as it will only collect a single terminal reward but it could take many steps to end the episode and we need to ensure that, if the paper is place in the bin, the positive outcome is not cancelled out. Please note: the rewards are always relative to one another and I have chosen arbitrary figures, but these can be changed if the results are not as desired.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.40)

Add feedback

hzwer/SARA_DDPG

#artificialintelligenceMar-12-2019, 10:47:12 GMT

Excellent painters can use only a few strokes to create a fantastic painting, which is a symbol of human inte and art. Reversing the simulator to interpret images is also a challenging task of computer vision in recent years. In this paper, we propose a stroke-based rendering (SBR) method that combines the neural stroke renderer (NSR) and deep reinforcement learning (DRL), allowing the machine to learn the ability of deconstructing images using strokes and create amazing visual effects. Our agent is an end-to-end program that converts natural images into paintings. The training process does not require human painting experience or stroke tracking data.

ddpg, machine learning, reinforcement learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback

Learning Gaussian Policies from Corrective Human Feedback

Wout, Daan, Scholten, Jan, Celemin, Carlos, Kober, Jens

arXiv.org Machine LearningMar-12-2019

Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher's learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1903.05216

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

On the Pitfalls of Measuring Emergent Communication

Lowe, Ryan, Foerster, Jakob, Boureau, Y-Lan, Pineau, Joelle, Dauphin, Yann

arXiv.org Artificial IntelligenceMar-12-2019

How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent's learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents' behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1903.05168

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Africa > Ghana > Greater Accra > Accra (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback