Goto

Collaborating Authors

 Reinforcement Learning


A Deep Recurrent Q Network towards Self-adapting Distributed Microservices architecture

arXiv.org Artificial Intelligence

One desired aspect of microservices architecture is the ability to self-adapt its own architecture and behaviour in response to changes in the operational environment. To achieve the desired high levels of self-adaptability, this research implements the distributed microservices architectures model, as informed by the MAPE-K model. The proposed architecture employs a multi adaptation agents supported by a centralised controller, that can observe the environment and execute a suitable adaptation action. The adaptation planning is managed by a deep recurrent Q-network (DRQN). It is argued that such integration between DRQN and MDP agents in a MAPE-K model offers distributed microservice architecture with self-adaptability and high levels of availability and scalability. Integrating DRQN into the adaptation process improves the effectiveness of the adaptation and reduces any adaptation risks, including resources over-provisioning and thrashing. The performance of DRQN is evaluated against deep Q-learning and policy gradient algorithms including: i) deep q-network (DQN), ii) dulling deep Q-network (DDQN), iii) a policy gradient neural network (PGNN), and iv) deep deterministic policy gradient (DDPG). The DRQN implementation in this paper manages to outperform the above mentioned algorithms in terms of total reward, less adaptation time, lower error rates, plus faster convergence and training times. We strongly believe that DRQN is more suitable for driving the adaptation in distributed services-oriented architecture and offers better performance than other dynamic decision-making algorithms. Index Terms Service oriented architecture, self-adaptive architectures, reinforcement learning, Q-learning algorithms, deep Q-Learning networks, recurrent Q-learning networks, policy approximation, multi agents environment. I. INTRODUCTION Self-adaptability refers to the ability of service oriented architecture (SOA) to modify its own structure and behaviour in response to changes in the operating environment [1]. High levels of self-adaptability present the challenges of self-organising, self-tuning, and self-healing the architecture against an interruption. Moreover, because of the services' pervasiveness, and in order to make any adaptation strategy effective and successful, adaptation actions must be considered in conjunction with So that the performed action meets the adaptation goals, objectives, and the desired architecture quality attributes [2]-[4].



This AI system can design RNA

#artificialintelligence

RNA, or ribonucleic acid, is present in all living cells. It acts as a messenger, carrying instructions from DNA (deoxyribonucleic acid) that dictate how proteins in the body are synthesized. And when it doesn't work as it should, it can severely affect neurological, cardiovascular, and muscular regulatory processes, resulting in effects like tumors, insulin resistance, and motor skill impairment. That's why researchers at the University of Freiburg's Department of Computer Science developed an AI system -- LEARNA -- that can learn to design RNA molecules for study. It's described in a new paper ("Learning to Design RNA") published this week on the preprint server Arxiv.org.


MIT Deep Learning

#artificialintelligence

This page is a collection of MIT courses and lectures on deep learning, deep reinforcement learning, autonomous vehicles, and artificial intelligence taught by Lex Fridman. New lectures will be up in January. I am teaching 3 courses this January. There will be a lecture every day at 3-4:30pm for 4 weeks (Mon, Jan 7 to Fri, Feb 1). Location is room 54-100 (directions).


Adaptive Guidance with Reinforcement Meta-Learning

arXiv.org Artificial Intelligence

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four difficult tasks with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a recurrent policy to navigate using only Doppler radar altimeter returns, thus integrating guidance and navigation. INTRODUCTION Many space missions take place in environments with complex and time-varying dynamics that may be incompletely modeled during the mission design phase. For example, during an orbital refueling mission, the inertia tensor of each of the two spacecraft will change significantly as fuel is transferred from one spacecraft to the other, which can make the combined system difficult to control. The wet mass of an exoatmospheric kill vehicles (EKV) consists largely of fuel, and as this is depleted with divert thrusts, the center of mass changes, and the divert thrusts are no longer orthogonal to the EKV's velocity vector, which wastes fuel and impacts performance. Future missions to asteroids might be undertaken before the asteroid's gravitational field, rotational velocity, and local solar radiation pressure are accurately modeled.


Improving Coordination in Multi-Agent Deep Reinforcement Learning through Memory-driven Communication

arXiv.org Machine Learning

Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with a task requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables the concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance, and illustrate how different communication patterns can emerge for different tasks.


Reinforcement learning's foundational flaw

#artificialintelligence

In this essay, we are going to address the limitations of one of the core fields of AI. In the process, we will encounter a fun allegory, a set of methods of incorporating prior knowledge and instruction into deep learning, and a radical conclusion.[1] The first part, which you're reading right now, will set up what RL is and why it (or at least a particular version of it we shall name'pure RL' and soon define) is fundamentally flawed. It will contain some explanation that can be skipped by AI practitioners -- but be sure to stick around for the discussion of recent non pure-RL work we shall argue represents the fix to pure RL's foundational flaw. But for now, let us start with a fun allegory.


On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator

arXiv.org Machine Learning

Imitation learning is a paradigm that learns from expert demonstration to perform a task. The most straightforward approach of imitation learning is behavioral cloning (Pomerleau, 1991), which learns from expert trajectories to predict the expert action at any state. Despite its simplicity, behavioral cloning ignores the accumulation of prediction error over time. Consequently, although the learned policy closely resembles the expert policy at a given point in time, their trajectories may diverge in the long term. To remedy the issue of error accumulation, inverse reinforcement learning(Russell, 1998; Ng and Russell, 2000; Abbeel and Ng, 2004; Ratliff et al., 2006; Ziebart et al., 2008; Ho and Ermon, 2016) jointly learns a reward function and the corresponding optimal policy, such that the expected cumulative


An investigation of model-free planning

arXiv.org Machine Learning

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.


Some Considerations on Learning to Explore via Meta-Reinforcement Learning

arXiv.org Artificial Intelligence

We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.