AITopics

1912.05784

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

#artificialintelligenceDec-11-2019, 11:40:03 GMT

r/MachineLearning - [D] I'm a Reinforcement Learning researcher and I'm leaving academia.

I definitely understand that, though I am a bit surprised to see MSR papers being referenced often in an RL workshop. I don't quite remember any RL papers of theirs that weren't strongly theoretical (other than that HRL paper by Hoang et Daume). I'm also a PhD student that until recently was doing RL, but have become tired of working in a field that's moving this fast and seems to be taking a'fast science' approach. It feels almost impossible to have any sort of real impact. I'm also just bored with reading RL papers, about 10% of the time I spend reading them is understanding the novelty, and the rest studying their experimentation to get a feel for whether it's worth anything.

academia, machinelearning, reinforcement learning researcher, (1 more...)

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Islam, Riashat, Ahmed, Zafarali, Precup, Doina

Marginalized State Distribution Entropy Regularization in Policy Optimization

arXiv.org Machine LearningDec-11-2019

Entropy regularization is used to get improved optimization performance in reinforcement learning tasks. A common form of regularization is to maximize policy entropy to avoid premature convergence and lead to more stochastic policies for exploration through action space. However, this does not ensure exploration in the state space. In this work, we instead consider the distribution of discounted weighting of states, and propose to maximize the entropy of a lower bound approximation to the weighting of a state, based on latent space state representation. We propose entropy regularization based on the marginal state distribution, to encourage the policy to have a more uniform distribution over the state space for exploration. Our approach based on marginal state distribution achieves superior state space coverage on complex gridworld domains, that translate into empirical gains in sparse reward 3D maze navigation and continuous control domains compared to entropy regularization with stochastic policies.

entropy, regularization, state distribution, (15 more...)

arXiv.org Machine Learning

1912.05128

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningDec-11-2019

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Luo, Michael, Yao, Jiahao, Liaw, Richard, Liang, Eric, Stoica, Ion

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning architectures to parallelize and accelerate the training process. However, modern methods for scalable reinforcement learning (RL) often tradeoff between the throughput of samples that an RL agent can learn from (sample throughput) and the quality of learning from each sample (sample efficiency). In these scalable RL architectures, as one increases sample throughput (i.e. increasing parallelization in IMPALA), sample efficiency drops significantly. To address this, we propose a new distributed reinforcement learning algorithm, IMPACT. IMPACT extends IMPALA with three changes: a target network for stabilizing the surrogate objective, a circular buffer, and truncated importance sampling. In discrete action-space environments, we show that IMPACT attains higher reward and, simultaneously, achieves up to 30% decrease in training wall-time than that of IMPALA. For continuous control environments, IMPACT trains faster than existing scalable agents while preserving the sample efficiency of synchronous PPO.

imp act, imp ala, sample efficiency, (13 more...)

arXiv.org Machine Learning

1912.00167

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-11-2019

SMiRL: Surprise Minimizing RL in Dynamic Environments

Berseth, Glen, Geng, Daniel, Devin, Coline, Finn, Chelsea, Jayaraman, Dinesh, Levine, Sergey

All living organisms struggle against the forces of nature to carve out niches where they can maintain homeostasis. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). SMiRL trains an agent with the objective of maximizing the probability of observed states under a model trained on previously seen states. The resulting agents can acquire proactive behaviors that seek out and maintain stable conditions, such as balancing and damage avoidance, that are closely tied to an environment's prevailing sources of entropy, such as wind, earthquakes, and other agents. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls and navigate to escape enemy agents, without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task reward to accelerate reward-driven learning.

agent, arxiv preprint arxiv, smirl, (13 more...)

1912.0551

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.93)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

arXiv.org Artificial IntelligenceDec-11-2019

What Can Learned Intrinsic Rewards Capture?

Zheng, Zeyu, Oh, Junhyuk, Hessel, Matteo, Xu, Zhongwen, Kroiss, Manuel, van Hasselt, Hado, Silver, David, Singh, Satinder

Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. The objective of an agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt the multi-lifetime setting of the Optimal Rewards Framework, and propose to meta-learn an intrinsic reward function from experience that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. Rewards as a locus of knowledge provide guidance on "what" the agent should strive to do rather than "how" the agent should behave; the latter is more directly captured in policies or value functions for example. Thus, our focus here is on demonstrating the following: (1) that it is feasible to meta-learn good reward functions, (2) that the learned reward functions can capture interesting kinds of "what" knowledge, and (3) that because of the indirectness of this form of knowledge the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment. Reinforcement learning agents can store knowledge in their policies, value functions, state representations, and models of the environment dynamics. These components can be the loci of knowledge in the sense that they are structures in which knowledge, either learned from experience by the agent's algorithm or given by the agent-designer, can be deposited and reused.

agent, intrinsic reward, knowledge, (14 more...)

1912.055

Country: North America > United States > Michigan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceDec-11-2019

Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning

Wang, Tianying, Zhang, Hao, Toh, Wei Qi, Zhu, Hongyuan, Tan, Cheston, Wu, Yan, Liu, Yong, Jing, Wei

Learning-based methods have been used to pro-gram robotic tasks in recent years. However, extensive training is usually required not only for the initial task learning but also for generalizing the learned model to the same task but in different environments. In this paper, we propose a novel Deep Reinforcement Learning algorithm for efficient task generalization and environment adaptation in the robotic task learning problem. The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem. The proposed Deep Model Fusion (DMF) method reuses and combines the previously trained model to improve the learning efficiency and results.Besides, we also introduce a Multi-objective Guided Reward(MGR) shaping technique to further improve training efficiency.The proposed method was benchmarked with previous methods in various environments to validate its effectiveness.

application, objective, task generalization, (13 more...)

1912.05205

Country:

Asia > Singapore (0.05)
North America > Puerto Rico (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceDec-10-2019, 16:13:50 GMT

Humanoid robot rolls 360 after 25 minutes, using reinforcement learning model

Sign in to report inappropriate content. Robot being trained for 500 iterations to learn to control inclination of torso. This is done by selecting goal inclinations the robot attempts to reach, while creating a network of postures to move between. In the final evaluation (0:49), goals are selected manually, to force the robot to roll around 360 degrees to reach them.

humanoid robot roll 360, inclination, reinforcement

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

#artificialintelligenceDec-10-2019, 10:55:58 GMT

Using Reinforcement Learning to Design a Better Rocket Engine

In this blog, I'll discuss how I worked collaboratively with various domain experts, using reinforcement learning to develop innovative solutions in rocket engine development. In doing so, I'll demonstrate the application of ML techniques to the manufacturing industry and the role of the Machine Learning Product Manager. Machine learning (ML) has had an incredible impact across industries with numerous applications such as personalized TV recommendations and dynamic price models in your rideshare app. Because it is such a core component to the success of companies in the tech industry, advances in ML research and applications are developing at an astonishing rate. For industries outside of tech, ML can be utilized to personalize a user's experience, automate laborious tasks and optimize subjective decision making.

algorithm, control loop, engineer, (13 more...)

Country: Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.35)

Industry: Information Technology (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)

#artificialintelligenceDec-10-2019, 07:15:26 GMT

(PDF) Deep Reinforcement Learning applied to Fluid Mechanics: materials from the 2019 Flow/Interface School on Machine Learning and Data Driven Methods

We use cookies to make interactions with our website easy and meaningful, to better understand the use of our services, and to tailor advertising. For further information, including about cookie settings, please read our Cookie Policy . By continuing to use this site, you consent to the use of cookies.

deep reinforcement learning, flow interface school, learning and data driven method, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)