AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

OpenAI's Dactyl improves Dexterity of Robotic Hands without Human Input

#artificialintelligenceAug-2-2018, 13:09:52 GMT

OpenAI has trained a human-like robot hand to manipulate physical objects with unprecedented dexterity. Their system, called Dactyl, is trained entirely in simulation and transfers its knowledge to reality, adapting to real-world physics. Dactyl learns from scratch using the same general-purpose reinforcement learning algorithm and code as OpenAI Five. The results show that it's possible to train agents in simulation and have them solve real-world tasks, without physically-accurate modeling of the world. Dactyl is a system for manipulating objects using a Shadow Dexterous Hand.

Add feedback

Artificial intelligence system designs drugs from scratch

#artificialintelligenceAug-1-2018, 20:01:11 GMT

An artificial-intelligence approach created at the University of North Carolina at Chapel Hill Eshelman School of Pharmacy can teach itself to design new drug molecules from scratch and has the potential to dramatically accelerate the design of new drug candidates. The system is called Reinforcement Learning for Structural Evolution, known as ReLeaSE, and is an algorithm and computer program that comprises two neural networks which can be thought of as a teacher and a student. The teacher knows the syntax and linguistic rules behind the vocabulary of chemical structures for about 1.7 million known biologically active molecules. By working with the teacher, the student learns over time and becomes better at proposing molecules that are likely to be useful as new medicines. The University has applied for a patent for the technology, and the team published a proof-of-concept study in the journal Science Advances last week.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country: North America > United States > North Carolina (0.28)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Educational Setting > Higher Education (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Deep Reinforcement Learning for Distributed Dynamic Power Allocation in Wireless Networks

Nasir, Yasar Sinan, Guo, Dongning

arXiv.org Machine LearningAug-1-2018

This work demonstrates the potential of deep reinforcement learning techniques for transmit power control in emerging and future wireless networks. Various techniques have been proposed in the literature to find near-optimal power allocations, often by solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. In this paper, a model-free distributed dynamic power allocation scheme is developed based on deep reinforcement learning. Each transmitter collects CSI and quality of service (QoS) information from several neighbors and adapts its own transmit power accordingly. The objective is to maximize a weighted sum-rate utility function, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling (with weights that are changing over time). Both random variations and delays in the CSI are inherently addressed using deep Q-learning. For a typical network architecture, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. This work indicates that deep reinforcement learning based radio resource management can be very fast and deliver highly competitive performance, especially in practical scenarios where the system model is inaccurate and CSI delay is non-negligible.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1808.0049

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Telecommunications (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Robbins-Mobro conditions for persistent exploration learning strategies

Rokhlin, Dmitry B.

arXiv.org Machine LearningAug-1-2018

We formulate simple assumptions, implying the Robbins-Monro conditions for the $Q$-learning algorithm with the local learning rate, depending on the number of visits of a particular state-action pair (local clock) and the number of iteration (global clock). It is assumed that the Markov decision process is communicating and the learning policy ensures the persistent exploration. The restrictions are imposed on the functional dependence of the learning rate on the local and global clocks. The result partially confirms the conjecture of Bradkte (1994).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1808.00245

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback

Experience, Imitation and Reflection; Confucius' Conjecture and Machine Learning

Dooraki, Amir Ramezani

arXiv.org Artificial IntelligenceAug-1-2018

Noname manuscript No. (will be inserted by the editor) Abstract Artificial intelligence recently had a great advancements caused by the emergence of new processing power and machine learning methods. Having said that, the learning capability of artificial intelligence is still at its infancy comparing to the learning capability of human and many animals. Many of the current artificial intelligence applications can only operate in a very orchestrated, specific environments with an extensive training set that exactly describes the conditions that will occur during execution time. Having that in mind, and considering the several existing machine learning methods this question rises that'What are some of the best ways for a machine to learn?' Regarding the learning methods of human, Confucius' point of view is that they are by experience, imitation and reflection. This paper tries to explore and discuss regarding these three ways of learning and their implementations in machines by having a look at how they happen in minds. Keywords Artificial Intelligence · Supervised Learning · Reinforcement Learning · Unsupervised Learning · Machine Imagination · Machine Learning · Cognitive Development 1 Introduction How minds work, or in another word how a human brain thinks, with the goal of implementing it in machines, is a long-term question in artificial intelligence.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1808.00222

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(3 more...)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Dexterous In-Hand Manipulation

OpenAI, null

arXiv.org Artificial IntelligenceAug-1-2018

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1808.00177

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(19 more...)

Genre: Research Report > New Finding (0.68)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(2 more...)

Add feedback

Count-Based Exploration with the Successor Representation

Machado, Marlos C., Bellemare, Marc G., Bowling, Michael

arXiv.org Artificial IntelligenceJul-30-2018

The problem of exploration in reinforcement learning is well-understood in the tabular case and many sample-efficient algorithms are known. Nevertheless, it is often unclear how the algorithms in the tabular setting can be extended to tasks with large state-spaces where generalization is required. Recent promising developments generally depend on problem-specific density models or handcrafted features. In this paper we introduce a simple approach for exploration that allows us to develop theoretically justified algorithms in the tabular case but that also give us intuitions for new algorithms applicable to settings where function approximation is required. Our approach and its underlying theory is based on the substochastic successor representation, a concept we develop here. While the traditional successor representation is a representation that defines state generalization by the similarity of successor states, the substochastic successor representation is also able to implicitly count the number of times each state (or feature) has been observed. This extension connects two until now disjoint areas of research. We show in traditional tabular domains (RiverSwim and SixArms) that our algorithm empirically performs as well as other sample-efficient algorithms. We then describe a deep reinforcement learning algorithm inspired by these ideas and show that it matches the performance of recent pseudo-count-based methods in hard exploration Atari 2600 games.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1807.11622

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots

Ye, Xin, Lin, Zhe, Li, Haoxiang, Zheng, Shibin, Yang, Yezhou

arXiv.org Artificial IntelligenceJul-30-2018

Developing an autonomous mobile robot which can reliably search, locate and reach an arbitrary object in an indoor environment is both fascinating and extremely challenging which motivates multi-disciplinary research ideas across robotics, computational perception, machine learning. In practice, a solution to this task will have a wide range of robotics applications, such as an assistant robot to search for survivors from an unknown disastrous environment for the first responders, or an elderly care-giving robot to locate and/or retrieve objects of interest for its clients. Solving this challenge has the potential to kick off the next phase of our human life style revolution that aims to increase people's living standard and enrich people's everyday life. We fully acknowledge that studies approaching the problem have a long history. Tracing back to the 1970s and 1980s, when the concept coined as the "active perception" was widely explored, this "robot with vision that finds object" task was one of the major motivating tasks to show that "vision is active" [1]. As stated in a recent survey article [2], two primary aspects of "active perception" are 1) from intelligent control point of view, it is about intelligent control strategies applied to the perception process [3], and 2) from computational perception point of view, it is about manipulating the perception constraints to improve the quality of

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1807.11174

Country:

North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Visual Analogies between Atari Games for Studying Transfer Learning in RL

Sobol, Doron, Wolf, Lior, Taigman, Yaniv

arXiv.org Machine LearningJul-29-2018

In this work, we ask the following question: Can visual analogies, learned in an unsupervised way, be used in order to transfer knowledge between pairs of games and even play one game using an agent trained for another game? We attempt to answer this research question by creating visual analogies between a pair of games: a source game and a target game. For example, given a video frame in the target game, we map it to an analogous state in the source game and then attempt to play using a trained policy learned for the source game. We demonstrate convincing visual mapping between four pairs of games (eight mappings), which are used to evaluate three transfer learning approaches.

machine learning, reinforcement learning, source game, (15 more...)

arXiv.org Machine Learning

1807.11074

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.64)

Add feedback

Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration

Li, Tingguang, Pan, Jin, Zhu, Delong, Meng, Max Q. -H.

arXiv.org Artificial IntelligenceJul-29-2018

To achieve scenario intelligence, humans must transfer knowledge to robots by developing goal-oriented algorithms, which are sometimes insensitive to dynamically changing environments. While deep reinforcement learning achieves significant success recently, it is still extremely difficult to be deployed in real robots directly. In this paper, we propose a hybrid structure named Option-Interruption in which human knowledge is embedded into a hierarchical reinforcement learning framework. Our architecture has two key components: options, represented by existing human-designed methods, can significantly speed up the training process and interruption mechanism, based on learnable termination functions, enables our system to quickly respond to the external environment. To implement this architecture, we derive a set of update rules based on policy gradient methods and present a complete training process. In the experiment part, our method is evaluated in Four-room navigation and exploration task, which shows the efficiency and flexibility of our framework.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1807.1115

Country: Asia > China > Hong Kong (0.05)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback