AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Improving Reinforcement Learning Based Image Captioning with Natural Language Prior

Guo, Tszhang, Chang, Shiyu, Yu, Mo, Bai, Kun

arXiv.org Machine LearningSep-13-2018

Recently, Reinforcement Learning (RL) approaches have demonstrated advanced performance in image captioning by directly optimizing the metric used for testing. However, this shaped reward introduces learning biases, which reduces the readability of generated text. In addition, the large sample space makes training unstable and slow. To alleviate these issues, we propose a simple coherent solution that constrains the action space using an n-gram language prior. Quantitative and qualitative evaluations on benchmarks show that RL with the simple add-on module performs favorably against its counterpart in terms of both readability and speed of convergence. Human evaluation results show that our model is more human readable and graceful. The implementation will become publicly available upon the acceptance of the paper.

caption, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1809.06227

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unity tweaks AI training tools, makes bid for academic respect

#artificialintelligenceSep-12-2018, 01:21:41 GMT

Unity Technologies on Monday released version 0.5 of its ML-Agents toolkit to make its Unity 3D game development platform better suited for developing and training autonomous agent code via machine learning. Initially rolled out a year ago in beta, version 0.5 comes with a few improvements. There's a wrapper for Gym (a toolkit for developing and testing reinforcement learning algorithms), support for letting agents make multiple action selections at once and for preventing agents from taking certain actions, and a refurbished set of environments called Marathon Environments. In these virtual spaces, AI researchers can teach software agents to perform certain tasks by rewarding them for correct actions. This sort of reinforcement learning can be limited to digital environments like video games or mapped to software-driven machines in the real world. Through its latest code update, Unity is making the case for Unity 3D as a key tool for AI research, a goal that company code boffins describe in a preprint paper titled, "Unity: A General Platform for Intelligent Agents."

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Country: Europe > Sweden > Skåne County > Malmö (0.06)

Genre: Research Report (0.52)

Industry:

Leisure & Entertainment > Games > Computer Games (0.58)
Information Technology (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Combined Reinforcement Learning via Abstract Representations

François-Lavet, Vincent, Bengio, Yoshua, Precup, Doina, Pineau, Joelle

arXiv.org Machine LearningSep-12-2018

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this paper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1809.04506

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sim-to-Real Transfer Learning using Robustified Controllers in Robotic Tasks involving Complex Dynamics

van Baar, Jeroen, Sullivan, Alan, Cordorel, Radu, Jha, Devesh, Romeres, Diego, Nikovski, Daniel

arXiv.org Machine LearningSep-12-2018

Learning robot tasks or controllers using deep reinforcement learning has been proven effective in simulations. Learning in simulation has several advantages. For example, one can fully control the simulated environment, including halting motions while performing computations. Another advantage when robots are involved, is that the amount of time a robot is occupied learning a task---rather than being productive---can be reduced by transferring the learned task to the real robot. Transfer learning requires some amount of fine-tuning on the real robot. For tasks which involve complex (non-linear) dynamics, the fine-tuning itself may take a substantial amount of time. In order to reduce the amount of fine-tuning we propose to learn robustified controllers in simulation. Robustified controllers are learned by exploiting the ability to change simulation parameters (both appearance and dynamics) for successive training episodes. An additional benefit for this approach is that it alleviates the precise determination of physics parameters for the simulator, which is a non-trivial task. We demonstrate our proposed approach on a real setup in which a robot aims to solve a maze puzzle, which involves complex dynamics due to static friction and potentially large accelerations. We show that the amount of fine-tuning in transfer learning for a robustified controller is substantially reduced compared to a non-robustified controller.

machine learning, reinforcement learning, simulation, (15 more...)

arXiv.org Machine Learning

1809.0472

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.83)

Add feedback

Reinforcement Learning in Topology-based Representation for Human Body Movement with Whole Arm Manipulation

Yuan, Weihao, Hang, Kaiyu, Song, Haoran, Kragic, Danica, Wang, Michael Y., Stork, Johannes A.

arXiv.org Artificial IntelligenceSep-12-2018

Moving a human body or a large and bulky object can require the strength of whole arm manipulation (WAM). This type of manipulation places the load on the robot's arms and relies on global properties of the interaction to succeed---rather than local contacts such as grasping or non-prehensile pushing. In this paper, we learn to generate motions that enable WAM for holding and transporting of humans in certain rescue or patient care scenarios. We model the task as a reinforcement learning problem in order to provide a behavior that can directly respond to external perturbation and human motion. For this, we represent global properties of the robot-human interaction with topology-based coordinates that are computed from arm and torso positions. These coordinates also allow transferring the learned policy to other body shapes and sizes. For training and evaluation, we simulate a dynamic sea rescue scenario and show in quantitative experiments that the policy can solve unseen scenarios with differently-shaped humans, floating humans, or with perception noise. Our qualitative experiments show the subsequent transporting after holding is achieved and we demonstrate that the policy can be directly transferred to a real world setting.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1809.04322

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multi-task Deep Reinforcement Learning with PopArt

Hessel, Matteo, Soyer, Hubert, Espeholt, Lasse, Czarnecki, Wojciech, Schmitt, Simon, van Hasselt, Hado

arXiv.org Machine LearningSep-12-2018

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1809.04474

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Re-purposing Compact Neuronal Circuit Policies to Govern Reinforcement Learning Tasks

Hasani, Ramin M., Lechner, Mathias, Amini, Alexander, Rus, Daniela, Grosu, Radu

arXiv.org Machine LearningSep-11-2018

We propose an effective method for creating interpretable control agents, by re-purposing the function of a biological neural circuit model, to govern simulated and real world reinforcement learning (RL) test-beds. Inspired by the structure of the nervous system of the soil-worm, C. elegans, we introduce Neuronal Circuit Policies (NCPs) as a novel recurrent neural network instance with liquid time-constants, universal approximation capabilities and interpretable dynamics. We theoretically show that they can approximate any finite simulation time of a given continuous n-dimensional dynamical system, with n output units and some hidden units. We model instances of the policies and learn their synaptic and neuronal parameters to control standard RL tasks and demonstrate its application for autonomous parking of a real rover robot on a predefined trajectory. For reconfiguration of the purpose of the neural circuit, we adopt a search-based RL algorithm. We show that our neuronal circuit policies perform as good as deep neural network policies with the advantage of realizing interpretable dynamics at the cell-level. We theoretically find bounds for the time-varying dynamics of the circuits, and introduce a novel way to reason about networks' dynamics.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1809.04423

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning

Caselles-Dupré, Hugo, Annabi, Louis, Hagen, Oksana, Garcia-Ortiz, Michael, Filliat, David

arXiv.org Machine LearningSep-10-2018

Flatland is a simple, lightweight environment for fast prototyping and testing of reinforcement learning agents. It is of lower complexity compared to similar 3D platforms (e.g. DeepMind Lab or VizDoom), but emulates physical properties of the real world, such as continuity, multi-modal partially-observable states with first-person view and coherent physics. We propose to use it as an intermediary benchmark for problems related to Lifelong Learning. Flatland is highly customizable and offers a wide range of task difficulty to extensively evaluate the properties of artificial agents. We experiment with three reinforcement learning baseline agents and show that they can rapidly solve a navigation task in Flatland. A video of an agent acting in Flatland is available here: https://youtu.be/I5y6Y2ZypdA.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1809.0051

Country: Europe > Sweden > Skåne County > Malmö (0.05)

Genre: Research Report (0.83)

Industry: Education > Educational Setting > Continuing Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning

Wang, Weixun, Jin, Junqi, Hao, Jianye, Chen, Chunjie, Yu, Chuan, Zhang, Weinan, Wang, Jun, Wang, Yixi, Li, Han, Xu, Jian, Gai, Kun

arXiv.org Machine LearningSep-10-2018

For online advertising in e-commerce, the traditional problem is to assign the right ad to the right user on fixed ad slots. In this paper, we investigate the problem of advertising with adaptive exposure, in which the number of ad slots and their locations can dynamically change over time based on their relative scores with recommendation products. In order to maintain user retention and long-term revenue, there are two types of constraints that need to be met in exposure: query-level and day-level constraints. We model this problem as constrained markov decision process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning to decouple the original advertising exposure optimization problem into two relatively independent sub-optimization problems. We also propose a constrained hindsight experience replay mechanism to accelerate the policy training process. Experimental results show that our method can improve the advertising revenue while satisfying different levels of constraints under the real-world datasets. Besides, the proposal of constrained hindsight experience replay mechanism can significantly improve the training speed and the stability of policy performance.

constraint, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1809.03149

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Marketing (1.00)
Information Technology > Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States

Wolf, Peter, Kurzer, Karl, Wingert, Tobias, Kuhnt, Florian, Zöllner, J. Marius

arXiv.org Machine LearningSep-10-2018

Personal use of this material is permitted. Abstract-- Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge. In this work we use Deep Reinforcement Learning to learn maneuver decisions based on a compact semantic state representation. This ensures a consistent model of the environment across scenarios as well as a behavior adaptation function, enabling online changes of desired behaviors without retraining. The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors, superimposed by a relational semantic scene description. The state as well as the reward are extended by a behavior adaptation function and a parameterization respectively. With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere to traffic rules and learns to drive safely in a variety of situations. While sensors are improving at a staggering pace and actuators as well as control theory are well up to par to the challenging task of autonomous driving, it is yet to be seen how a robot can devise decisions that navigate it safely in a heterogeneous environment that is partially made up by humans who not always take rational decisions or have known cost functions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1809.03214

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback