Goto

Collaborating Authors

 Reinforcement Learning


Autonomous Control of a Line Follower Robot Using a Q-Learning Controller

arXiv.org Machine Learning

In this paper, a MIMO simulated annealing SA based Q learning method is proposed to control a line follower robot. The conventional controller for these types of robots is the proportional P controller. Considering the unknown mechanical characteristics of the robot and uncertainties such as friction and slippery surfaces, system modeling and controller designing can be extremely challenging. The mathematical modeling for the robot is presented in this paper, and a simulator is designed based on this model. The basic Q learning methods are based pure exploitation and the epsilon-greedy methods, which help exploration, can harm the controller performance after learning completion by exploring nonoptimal actions. The simulated annealing based Q learning method tackles this drawback by decreasing the exploration rate when the learning increases. The simulation and experimental results are provided to evaluate the effectiveness of the proposed controller.


Exploration Based Language Learning for Text-Based Games

arXiv.org Artificial Intelligence

This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space.


Graph Constrained Reinforcement Learning for Natural Language Action Spaces

arXiv.org Artificial Intelligence

Interactive Fiction games are text-based simulations in which an agent interacts with the world purely through natural language. They are ideal environments for studying how to extend reinforcement learning agents to meet the challenges of natural language understanding, partial observability, and action generation in combinatorially-large text-based action spaces. We present KG-A2C, an agent that builds a dynamic knowledge graph while exploring and generates actions using a template-based action space. We contend that the dual uses of the knowledge graph to reason about game state and to constrain natural language generation are the keys to scalable exploration of combinatorially large natural language actions. Results across a wide variety of IF games show that KG-A2C outperforms current IF agents despite the exponential increase in action space size.


What's a Good Prediction? Issues in Evaluating General Value Functions Through Error

arXiv.org Artificial Intelligence

Constructing and maintaining knowledge of the world is a central problem for artificial intelligence research. Approaches to constructing an agent's knowledge using predictions have received increased amounts of interest in recent years. A particularly promising collection of research centres itself around architectures that formulate predictions as General Value Functions (GVFs), an approach commonly referred to as \textit{predictive knowledge}. A pernicious challenge for predictive knowledge architectures is determining what to predict. In this paper, we argue that evaluation methods---i.e., return error and RUPEE---are not well suited for the challenges of determining what to predict. As a primary contribution, we provide extended examples that evaluate predictions in terms of how they are used in further prediction tasks: a key motivation of predictive knowledge systems. We demonstrate that simply because a GVF's error is low, it does not necessarily follow the prediction is useful as a cumulant. We suggest evaluating 1) the relevance of a GVF's features to the prediction task at hand, and 2) evaluation of GVFs by \textit{how} they are used. To determine feature relevance, we generalize AutoStep to GTD, producing a step-size learning method suited to the life-long continual learning settings that predictive knowledge architectures are commonly deployed in. This paper contributes a first look into evaluation of predictions through their use, an integral component of predictive knowledge which is as of yet explored.


Reinforcement Learning 101

#artificialintelligence

Reinforcement Learning(RL) is one of the hottest research topics in the field of modern Artificial Intelligence and its popularity is only growing. Let's look at 5 useful things one needs to know to get started with RL. Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where the feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses rewards and punishments as signals for positive and negative behavior. As compared to unsupervised learning, reinforcement learning is different in terms of goals.


The Brain Predicts Reward Like an AI, Says New DeepMind Research

#artificialintelligence

The idea of reinforcement learning--or learning based on reward--has been around for so long it's easy to forget we don't really know how it works. If DeepMind's new bombshell paper in Nature is any indication, a common approach in AI, one that's led to humanity's defeat in the game of Go against machines, may have the answer. We all subconsciously learn complex behaviors in response to positive and negative feedback, but how that works in the brain remains a century-long mystery. By examining a powerful variant of reinforcement learning, dubbed distributional reinforcement learning, that outperforms original methods, the team suggests that the brain may simultaneously represent multiple predicted futures in parallel. Each future is assigned a different probability, or chance of actually occurring, based on reward.


Marwa Yousif Hassan on LinkedIn: The Brain Predicts Reward Like an AI, Says New DeepMind Research

#artificialintelligence

"In #distributional #Reinforcement_Learning, the #AI algorithm predicts a full spectrum of future rewards: some are more optimistic and amplify their reward signals when the reward is larger than expected; others more pessimistic, lowering their reward signals when it's smaller than predicted." "Partnering with Harvard, the teams tested out their idea in the brains of mice. In contrast to neuroscience canon, the team said, reward neurons didn't act as one. Rather than collectively encoding for a single expected outcome, they were each "tuned" to a different prediction, with some expecting a larger amount of reward, and others less hopeful, predicting smaller volumes" "We found that reward neurons in the brain were each tuned to different levels of pessimism or optimism. If they were a choir, they wouldn't all be singing the same note, but harmonizing" "In other words, they seemed to operate on very similar principles to distributed reinforcement learning, a powerful method in #AI." https://lnkd.in/grTTXeA


On Solving Cooperative MARL Problems with a Few Good Experiences

arXiv.org Artificial Intelligence

Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for cooperative decentralized decision learning in many domains such as search and rescue, drone surveillance, package delivery and fire fighting problems. In these domains, a key challenge is learning with a few good experiences, i.e., positive reinforcements are obtained only in a few situations (e.g., on extinguishing a fire or tracking a crime or delivering a package) and in most other situations there is zero or negative reinforcement. Learning decisions with a few good experiences is extremely challenging in cooperative MARL problems due to three reasons. First, compared to the single agent case, exploration is harder as multiple agents have to be coordinated to receive a good experience. Second, environment is not stationary as all the agents are learning at the same time (and hence change policies). Third, scale of problem increases significantly with every additional agent. Relevant existing work is extensive and has focussed on dealing with a few good experiences in single-agent RL problems or on scalable approaches for handling non-stationarity in MARL problems. Unfortunately, neither of these approaches (or their extensions) are able to address the problem of sparse good experiences effectively. Therefore, we provide a novel fictitious self imitation approach that is able to simultaneously handle non-stationarity and sparse good experiences in a scalable manner. Finally, we provide a thorough comparison (experimental or descriptive) against relevant cooperative MARL algorithms to demonstrate the utility of our approach.


Local Policy Optimization for Trajectory-Centric Reinforcement Learning

arXiv.org Machine Learning

The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. Due to inaccuracies in the learned model estimates, an open-loop trajectory optimization process mostly results in very poor performance when used on the real system. Motivated by these problems, we try to formulate the problem of trajectory optimization and local policy synthesis as a single optimization problem. It is then solved simultaneously as an instance of nonlinear programming. We provide some results for analysis as well as achieved performance of the proposed technique under some simplifying assumptions.


Machine Learning assisted Handover and Resource Management for Cellular Connected Drones

arXiv.org Machine Learning

--Enabling cellular connectivity for drones introduces a wide set of challenges and opportunities. Communication of cellular-connected drones is influenced by 3-dimensional mobility and line-of-sight channel characteristics which results in higher number of handovers with increasing altitude. Our cell planning simulations in coexistence of aerial and terrestrial users indicate that the severe interference from drones to base stations is a major challenge for uplink communications of terrestrial users. Here, we first present the major challenges in coexistence of terrestrial and drone communications by considering real geographical network data for Stockholm. Then, we derive analytical models for the key performance indicators (KPIs), including communications delay and interference over cellular networks, and formulate the handover and radio resource management (H-RRM) optimization problem. Afterwards, we transform this problem into a machine learning problem, and propose a deep reinforcement learning solution to solve H-RRM problem. Especially, the heat-maps of handover decisions in different drone's altitudes/speeds have been presented, which promote a revision of the legacy handover schemes and redefining the boundaries of cells in the sky. I NTRODUCTION Commercial drone applications have attracted profound interest in recent years in a wide set of use-cases, including area monitoring, surveillance, and delivery [1].