Reinforcement Learning
OpenAI's Dactyl improves Dexterity of Robotic Hands without Human Input
OpenAI has trained a human-like robot hand to manipulate physical objects with unprecedented dexterity. Their system, called Dactyl, is trained entirely in simulation and transfers its knowledge to reality, adapting to real-world physics. Dactyl learns from scratch using the same general-purpose reinforcement learning algorithm and code as OpenAI Five. The results show that it's possible to train agents in simulation and have them solve real-world tasks, without physically-accurate modeling of the world. Dactyl is a system for manipulating objects using a Shadow Dexterous Hand.
Artificial intelligence system designs drugs from scratch
An artificial-intelligence approach created at the University of North Carolina at Chapel Hill Eshelman School of Pharmacy can teach itself to design new drug molecules from scratch and has the potential to dramatically accelerate the design of new drug candidates. The system is called Reinforcement Learning for Structural Evolution, known as ReLeaSE, and is an algorithm and computer program that comprises two neural networks which can be thought of as a teacher and a student. The teacher knows the syntax and linguistic rules behind the vocabulary of chemical structures for about 1.7 million known biologically active molecules. By working with the teacher, the student learns over time and becomes better at proposing molecules that are likely to be useful as new medicines. The University has applied for a patent for the technology, and the team published a proof-of-concept study in the journal Science Advances last week.
Deep Reinforcement Learning for Distributed Dynamic Power Allocation in Wireless Networks
Nasir, Yasar Sinan, Guo, Dongning
This work demonstrates the potential of deep reinforcement learning techniques for transmit power control in emerging and future wireless networks. Various techniques have been proposed in the literature to find near-optimal power allocations, often by solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. In this paper, a model-free distributed dynamic power allocation scheme is developed based on deep reinforcement learning. Each transmitter collects CSI and quality of service (QoS) information from several neighbors and adapts its own transmit power accordingly. The objective is to maximize a weighted sum-rate utility function, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling (with weights that are changing over time). Both random variations and delays in the CSI are inherently addressed using deep Q-learning. For a typical network architecture, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. This work indicates that deep reinforcement learning based radio resource management can be very fast and deliver highly competitive performance, especially in practical scenarios where the system model is inaccurate and CSI delay is non-negligible.
Robbins-Mobro conditions for persistent exploration learning strategies
We formulate simple assumptions, implying the Robbins-Monro conditions for the $Q$-learning algorithm with the local learning rate, depending on the number of visits of a particular state-action pair (local clock) and the number of iteration (global clock). It is assumed that the Markov decision process is communicating and the learning policy ensures the persistent exploration. The restrictions are imposed on the functional dependence of the learning rate on the local and global clocks. The result partially confirms the conjecture of Bradkte (1994).
Experience, Imitation and Reflection; Confucius' Conjecture and Machine Learning
Noname manuscript No. (will be inserted by the editor) Abstract Artificial intelligence recently had a great advancements caused by the emergence of new processing power and machine learning methods. Having said that, the learning capability of artificial intelligence is still at its infancy comparing to the learning capability of human and many animals. Many of the current artificial intelligence applications can only operate in a very orchestrated, specific environments with an extensive training set that exactly describes the conditions that will occur during execution time. Having that in mind, and considering the several existing machine learning methods this question rises that'What are some of the best ways for a machine to learn?' Regarding the learning methods of human, Confucius' point of view is that they are by experience, imitation and reflection. This paper tries to explore and discuss regarding these three ways of learning and their implementations in machines by having a look at how they happen in minds. Keywords Artificial Intelligence ยท Supervised Learning ยท Reinforcement Learning ยท Unsupervised Learning ยท Machine Imagination ยท Machine Learning ยท Cognitive Development 1 Introduction How minds work, or in another word how a human brain thinks, with the goal of implementing it in machines, is a long-term question in artificial intelligence.
Learning Dexterous In-Hand Manipulation
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM
Count-Based Exploration with the Successor Representation
Machado, Marlos C., Bellemare, Marc G., Bowling, Michael
The problem of exploration in reinforcement learning is well-understood in the tabular case and many sample-efficient algorithms are known. Nevertheless, it is often unclear how the algorithms in the tabular setting can be extended to tasks with large state-spaces where generalization is required. Recent promising developments generally depend on problem-specific density models or handcrafted features. In this paper we introduce a simple approach for exploration that allows us to develop theoretically justified algorithms in the tabular case but that also give us intuitions for new algorithms applicable to settings where function approximation is required. Our approach and its underlying theory is based on the substochastic successor representation, a concept we develop here. While the traditional successor representation is a representation that defines state generalization by the similarity of successor states, the substochastic successor representation is also able to implicitly count the number of times each state (or feature) has been observed. This extension connects two until now disjoint areas of research. We show in traditional tabular domains (RiverSwim and SixArms) that our algorithm empirically performs as well as other sample-efficient algorithms. We then describe a deep reinforcement learning algorithm inspired by these ideas and show that it matches the performance of recent pseudo-count-based methods in hard exploration Atari 2600 games.
Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots
Ye, Xin, Lin, Zhe, Li, Haoxiang, Zheng, Shibin, Yang, Yezhou
Developing an autonomous mobile robot which can reliably search, locate and reach an arbitrary object in an indoor environment is both fascinating and extremely challenging which motivates multi-disciplinary research ideas across robotics, computational perception, machine learning. In practice, a solution to this task will have a wide range of robotics applications, such as an assistant robot to search for survivors from an unknown disastrous environment for the first responders, or an elderly care-giving robot to locate and/or retrieve objects of interest for its clients. Solving this challenge has the potential to kick off the next phase of our human life style revolution that aims to increase people's living standard and enrich people's everyday life. We fully acknowledge that studies approaching the problem have a long history. Tracing back to the 1970s and 1980s, when the concept coined as the "active perception" was widely explored, this "robot with vision that finds object" task was one of the major motivating tasks to show that "vision is active" [1]. As stated in a recent survey article [2], two primary aspects of "active perception" are 1) from intelligent control point of view, it is about intelligent control strategies applied to the perception process [3], and 2) from computational perception point of view, it is about manipulating the perception constraints to improve the quality of
Visual Analogies between Atari Games for Studying Transfer Learning in RL
Sobol, Doron, Wolf, Lior, Taigman, Yaniv
In this work, we ask the following question: Can visual analogies, learned in an unsupervised way, be used in order to transfer knowledge between pairs of games and even play one game using an agent trained for another game? We attempt to answer this research question by creating visual analogies between a pair of games: a source game and a target game. For example, given a video frame in the target game, we map it to an analogous state in the source game and then attempt to play using a trained policy learned for the source game. We demonstrate convincing visual mapping between four pairs of games (eight mappings), which are used to evaluate three transfer learning approaches.
Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration
Li, Tingguang, Pan, Jin, Zhu, Delong, Meng, Max Q. -H.
To achieve scenario intelligence, humans must transfer knowledge to robots by developing goal-oriented algorithms, which are sometimes insensitive to dynamically changing environments. While deep reinforcement learning achieves significant success recently, it is still extremely difficult to be deployed in real robots directly. In this paper, we propose a hybrid structure named Option-Interruption in which human knowledge is embedded into a hierarchical reinforcement learning framework. Our architecture has two key components: options, represented by existing human-designed methods, can significantly speed up the training process and interruption mechanism, based on learnable termination functions, enables our system to quickly respond to the external environment. To implement this architecture, we derive a set of update rules based on policy gradient methods and present a complete training process. In the experiment part, our method is evaluated in Four-room navigation and exploration task, which shows the efficiency and flexibility of our framework.