Reinforcement Learning
Learning Improvement Heuristics for Solving the Travelling Salesman Problem
Wu, Yaoxin, Song, Wen, Cao, Zhiguang, Zhang, Jie, Lim, Andrew
Recent studies in using deep learning to solve the Travelling Salesman Problem (TSP) focus on construction heuristics, the solution of which may still be far from optimal-ity. To improve solution quality, additional procedures such as sampling or beam search are required. However, they are still based on the same construction policy, which is less effective in refining a solution. In this paper, we propose to directly learn the improvement heuristics for solving TSP based on deep reinforcement learning. We first present a reinforcement learning formulation for the improvement heuristic, where the policy guides selection of the next solution. Then, we propose a deep architecture as the policy network based on self-attention. Extensive experiments show that, improvement policies learned by our approach yield better results than state-of-the-art methods, even from random initial solutions. Moreover, the learned policies are more effective than the traditional handcrafted ones, and robust to different initial solutions with either high or poor quality. 1 Introduction The Travelling Salesman Problem (TSP) is a typical combinatorial optimization problem that has extensive applications in the real world. The problem statement is straightforward: given a set of locations, find the salesman a shortest tour that traverses each location exactly once and returns to the original one. Although having been widely studied for decades, achieving satisfactory performance is still challenging due to its NPhard complexity.
r/MachineLearning - [D] I'm a Reinforcement Learning researcher and I'm leaving academia.
I definitely understand that, though I am a bit surprised to see MSR papers being referenced often in an RL workshop. I don't quite remember any RL papers of theirs that weren't strongly theoretical (other than that HRL paper by Hoang et Daume). I'm also a PhD student that until recently was doing RL, but have become tired of working in a field that's moving this fast and seems to be taking a'fast science' approach. It feels almost impossible to have any sort of real impact. I'm also just bored with reading RL papers, about 10% of the time I spend reading them is understanding the novelty, and the rest studying their experimentation to get a feel for whether it's worth anything.
Marginalized State Distribution Entropy Regularization in Policy Optimization
Islam, Riashat, Ahmed, Zafarali, Precup, Doina
Entropy regularization is used to get improved optimization performance in reinforcement learning tasks. A common form of regularization is to maximize policy entropy to avoid premature convergence and lead to more stochastic policies for exploration through action space. However, this does not ensure exploration in the state space. In this work, we instead consider the distribution of discounted weighting of states, and propose to maximize the entropy of a lower bound approximation to the weighting of a state, based on latent space state representation. We propose entropy regularization based on the marginal state distribution, to encourage the policy to have a more uniform distribution over the state space for exploration. Our approach based on marginal state distribution achieves superior state space coverage on complex gridworld domains, that translate into empirical gains in sparse reward 3D maze navigation and continuous control domains compared to entropy regularization with stochastic policies.
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Luo, Michael, Yao, Jiahao, Liaw, Richard, Liang, Eric, Stoica, Ion
The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning architectures to parallelize and accelerate the training process. However, modern methods for scalable reinforcement learning (RL) often tradeoff between the throughput of samples that an RL agent can learn from (sample throughput) and the quality of learning from each sample (sample efficiency). In these scalable RL architectures, as one increases sample throughput (i.e. increasing parallelization in IMPALA), sample efficiency drops significantly. To address this, we propose a new distributed reinforcement learning algorithm, IMPACT. IMPACT extends IMPALA with three changes: a target network for stabilizing the surrogate objective, a circular buffer, and truncated importance sampling. In discrete action-space environments, we show that IMPACT attains higher reward and, simultaneously, achieves up to 30% decrease in training wall-time than that of IMPALA. For continuous control environments, IMPACT trains faster than existing scalable agents while preserving the sample efficiency of synchronous PPO.
SMiRL: Surprise Minimizing RL in Dynamic Environments
Berseth, Glen, Geng, Daniel, Devin, Coline, Finn, Chelsea, Jayaraman, Dinesh, Levine, Sergey
All living organisms struggle against the forces of nature to carve out niches where they can maintain homeostasis. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). SMiRL trains an agent with the objective of maximizing the probability of observed states under a model trained on previously seen states. The resulting agents can acquire proactive behaviors that seek out and maintain stable conditions, such as balancing and damage avoidance, that are closely tied to an environment's prevailing sources of entropy, such as wind, earthquakes, and other agents. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls and navigate to escape enemy agents, without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task reward to accelerate reward-driven learning.
What Can Learned Intrinsic Rewards Capture?
Zheng, Zeyu, Oh, Junhyuk, Hessel, Matteo, Xu, Zhongwen, Kroiss, Manuel, van Hasselt, Hado, Silver, David, Singh, Satinder
Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. The objective of an agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt the multi-lifetime setting of the Optimal Rewards Framework, and propose to meta-learn an intrinsic reward function from experience that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. Rewards as a locus of knowledge provide guidance on "what" the agent should strive to do rather than "how" the agent should behave; the latter is more directly captured in policies or value functions for example. Thus, our focus here is on demonstrating the following: (1) that it is feasible to meta-learn good reward functions, (2) that the learned reward functions can capture interesting kinds of "what" knowledge, and (3) that because of the indirectness of this form of knowledge the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment. Reinforcement learning agents can store knowledge in their policies, value functions, state representations, and models of the environment dynamics. These components can be the loci of knowledge in the sense that they are structures in which knowledge, either learned from experience by the agent's algorithm or given by the agent-designer, can be deposited and reused.
Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning
Wang, Tianying, Zhang, Hao, Toh, Wei Qi, Zhu, Hongyuan, Tan, Cheston, Wu, Yan, Liu, Yong, Jing, Wei
Learning-based methods have been used to pro-gram robotic tasks in recent years. However, extensive training is usually required not only for the initial task learning but also for generalizing the learned model to the same task but in different environments. In this paper, we propose a novel Deep Reinforcement Learning algorithm for efficient task generalization and environment adaptation in the robotic task learning problem. The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem. The proposed Deep Model Fusion (DMF) method reuses and combines the previously trained model to improve the learning efficiency and results.Besides, we also introduce a Multi-objective Guided Reward(MGR) shaping technique to further improve training efficiency.The proposed method was benchmarked with previous methods in various environments to validate its effectiveness.
Humanoid robot rolls 360 after 25 minutes, using reinforcement learning model
Sign in to report inappropriate content. Robot being trained for 500 iterations to learn to control inclination of torso. This is done by selecting goal inclinations the robot attempts to reach, while creating a network of postures to move between. In the final evaluation (0:49), goals are selected manually, to force the robot to roll around 360 degrees to reach them.
Using Reinforcement Learning to Design a Better Rocket Engine
In this blog, I'll discuss how I worked collaboratively with various domain experts, using reinforcement learning to develop innovative solutions in rocket engine development. In doing so, I'll demonstrate the application of ML techniques to the manufacturing industry and the role of the Machine Learning Product Manager. Machine learning (ML) has had an incredible impact across industries with numerous applications such as personalized TV recommendations and dynamic price models in your rideshare app. Because it is such a core component to the success of companies in the tech industry, advances in ML research and applications are developing at an astonishing rate. For industries outside of tech, ML can be utilized to personalize a user's experience, automate laborious tasks and optimize subjective decision making.
(PDF) Deep Reinforcement Learning applied to Fluid Mechanics: materials from the 2019 Flow/Interface School on Machine Learning and Data Driven Methods
We use cookies to make interactions with our website easy and meaningful, to better understand the use of our services, and to tailor advertising. For further information, including about cookie settings, please read our Cookie Policy . By continuing to use this site, you consent to the use of cookies.