AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Deep Recurrent Q Network towards Self-adapting Distributed Microservices architecture

Magableh, Basel

arXiv.org Artificial IntelligenceJan-14-2019

One desired aspect of microservices architecture is the ability to self-adapt its own architecture and behaviour in response to changes in the operational environment. To achieve the desired high levels of self-adaptability, this research implements the distributed microservices architectures model, as informed by the MAPE-K model. The proposed architecture employs a multi adaptation agents supported by a centralised controller, that can observe the environment and execute a suitable adaptation action. The adaptation planning is managed by a deep recurrent Q-network (DRQN). It is argued that such integration between DRQN and MDP agents in a MAPE-K model offers distributed microservice architecture with self-adaptability and high levels of availability and scalability. Integrating DRQN into the adaptation process improves the effectiveness of the adaptation and reduces any adaptation risks, including resources over-provisioning and thrashing. The performance of DRQN is evaluated against deep Q-learning and policy gradient algorithms including: i) deep q-network (DQN), ii) dulling deep Q-network (DDQN), iii) a policy gradient neural network (PGNN), and iv) deep deterministic policy gradient (DDPG). The DRQN implementation in this paper manages to outperform the above mentioned algorithms in terms of total reward, less adaptation time, lower error rates, plus faster convergence and training times. We strongly believe that DRQN is more suitable for driving the adaptation in distributed services-oriented architecture and offers better performance than other dynamic decision-making algorithms. Index Terms Service oriented architecture, self-adaptive architectures, reinforcement learning, Q-learning algorithms, deep Q-Learning networks, recurrent Q-learning networks, policy approximation, multi agents environment. I. INTRODUCTION Self-adaptability refers to the ability of service oriented architecture (SOA) to modify its own structure and behaviour in response to changes in the operating environment [1]. High levels of self-adaptability present the challenges of self-organising, self-tuning, and self-healing the architecture against an interruption. Moreover, because of the services' pervasiveness, and in order to make any adaptation strategy effective and successful, adaptation actions must be considered in conjunction with So that the performed action meets the adaptation goals, objectives, and the desired architecture quality attributes [2]-[4].

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1901.04011

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
(3 more...)

Genre:

Research Report (0.82)
Overview (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning

#artificialintelligenceJan-13-2019, 10:52:16 GMT

Wang, D., and Liu, Q. Learning to draw samples: With application to amortized MLE for generative adversarial learning.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.46)

Add feedback

This AI system can design RNA

#artificialintelligenceJan-12-2019, 21:32:11 GMT

RNA, or ribonucleic acid, is present in all living cells. It acts as a messenger, carrying instructions from DNA (deoxyribonucleic acid) that dictate how proteins in the body are synthesized. And when it doesn't work as it should, it can severely affect neurological, cardiovascular, and muscular regulatory processes, resulting in effects like tumors, insulin resistance, and motor skill impairment. That's why researchers at the University of Freiburg's Department of Computer Science developed an AI system -- LEARNA -- that can learn to design RNA molecules for study. It's described in a new paper ("Learning to Design RNA") published this week on the preprint server Arxiv.org.

rna design problem, state-of-the-art performance, target structure, (13 more...)

#artificialintelligence

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.26)

Genre: Research Report (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

MIT Deep Learning

#artificialintelligenceJan-12-2019, 18:02:51 GMT

This page is a collection of MIT courses and lectures on deep learning, deep reinforcement learning, autonomous vehicles, and artificial intelligence taught by Lex Fridman. New lectures will be up in January. I am teaching 3 courses this January. There will be a lecture every day at 3-4:30pm for 4 weeks (Mon, Jan 7 to Fri, Feb 1). Location is room 54-100 (directions).

mit deep learning

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Adaptive Guidance with Reinforcement Meta-Learning

Gaudet, Brian, Linares, Richard

arXiv.org Artificial IntelligenceJan-12-2019

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four difficult tasks with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a recurrent policy to navigate using only Doppler radar altimeter returns, thus integrating guidance and navigation. INTRODUCTION Many space missions take place in environments with complex and time-varying dynamics that may be incompletely modeled during the mission design phase. For example, during an orbital refueling mission, the inertia tensor of each of the two spacecraft will change significantly as fuel is transferred from one spacecraft to the other, which can make the combined system difficult to control. The wet mass of an exoatmospheric kill vehicles (EKV) consists largely of fuel, and as this is depleted with divert thrusts, the center of mass changes, and the divert thrusts are no longer orthogonal to the EKV's velocity vector, which wastes fuel and impacts performance. Future missions to asteroids might be undertaken before the asteroid's gravitational field, rotational velocity, and local solar radiation pressure are accurately modeled.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1901.04473

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improving Coordination in Multi-Agent Deep Reinforcement Learning through Memory-driven Communication

Pesce, Emanuele, Montana, Giovanni

arXiv.org Machine LearningJan-12-2019

Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with a task requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables the concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance, and illustrate how different communication patterns can emerge for different tasks.

agent, cooperative navigation, learning, (12 more...)

arXiv.org Machine Learning

1901.03887

Country:

Europe > United Kingdom (0.04)
North America > United States > Montana (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reinforcement learning's foundational flaw

#artificialintelligenceJan-11-2019, 17:21:00 GMT

In this essay, we are going to address the limitations of one of the core fields of AI. In the process, we will encounter a fun allegory, a set of methods of incorporating prior knowledge and instruction into deep learning, and a radical conclusion.[1] The first part, which you're reading right now, will set up what RL is and why it (or at least a particular version of it we shall name'pure RL' and soon define) is fundamentally flawed. It will contain some explanation that can be skipped by AI practitioners -- but be sure to stick around for the discussion of recent non pure-RL work we shall argue represents the fix to pure RL's foundational flaw. But for now, let us start with a fun allegory.

machine learning, pure rl, reinforcement learning, (15 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator

Cai, Qi, Hong, Mingyi, Chen, Yongxin, Wang, Zhaoran

arXiv.org Machine LearningJan-11-2019

Imitation learning is a paradigm that learns from expert demonstration to perform a task. The most straightforward approach of imitation learning is behavioral cloning (Pomerleau, 1991), which learns from expert trajectories to predict the expert action at any state. Despite its simplicity, behavioral cloning ignores the accumulation of prediction error over time. Consequently, although the learned policy closely resembles the expert policy at a given point in time, their trajectories may diverge in the long term. To remedy the issue of error accumulation, inverse reinforcement learning(Russell, 1998; Ng and Russell, 2000; Abbeel and Ng, 2004; Ratliff et al., 2006; Ziebart et al., 2008; Ho and Ermon, 2016) jointly learns a reward function and the corresponding optimal policy, such that the expected cumulative

condition 4, convergence, lemma 4, (13 more...)

arXiv.org Machine Learning

1901.03674

Country:

North America > United States > Minnesota (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

An investigation of model-free planning

Guez, Arthur, Mirza, Mehdi, Gregor, Karol, Kabra, Rishabh, Racanière, Sébastien, Weber, Théophane, Raposo, David, Santoro, Adam, Orseau, Laurent, Eccles, Tom, Wayne, Greg, Silver, David, Lillicrap, Timothy

arXiv.org Machine LearningJan-11-2019

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

architecture, drc, latexit latexitsha1, (13 more...)

arXiv.org Machine Learning

1901.03559

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Stadie, Bradly C., Yang, Ge, Houthooft, Rein, Chen, Xi, Duan, Yan, Wu, Yuhuai, Abbeel, Pieter, Sutskever, Ilya

arXiv.org Artificial IntelligenceJan-11-2019

We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1803.01118

Country: North America > Canada (0.46)

Genre: Research Report (0.50)

Industry: Education (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback