Goto

Collaborating Authors

 Reinforcement Learning


Reinforcement Learning for Assignment problem

arXiv.org Artificial Intelligence

On Demand services, such as a ride sharing [1], coordination of multiply robots [2], user serving in MIMO networks [3] etc utilize management strategies in order to improve customer quality of service (QoS) requirements. The problem of shared resource utilization is very common in wireless networks [4] and becoming more important with more devices connected because of development of IoT and 5G. Usually such systems have multiply concurrent users awaiting serving and fewer number of workers resources available, along with switching costs from serving user to user (like trip for taxi driver from drop off of one user to pick up point of the next one). Real world systems are dynamic in nature with cause and effect information not being given and system behavior and QoS only being observed. Previous works developed different algorithmic or classical scheduling methods, where QoS is maintained via algorithm using some sort of priority index, like Proportional Fair [5], [3] or MLWDF [6]. This work focuses on reinforced learning applied to general formulation of user scheduling problem. A Q-learning based method is presented for maximizing customer QoS and compared to analytical strategies. A Q-learning approach is shown to improve QoS up to TODO% compared to baseline scenarios.


Exploiting Multiple Intelligent Reflecting Surfaces in Multi-Cell Uplink MIMO Communications

arXiv.org Artificial Intelligence

Applications of intelligent reflecting surfaces (IRSs) in wireless networks have attracted significant attention recently. Most of the relevant literature is focused on the single cell setting where a single IRS is deployed, while static and perfect channel state information (CSI) is assumed. In this work, we develop a novel methodology for multi-IRS-assisted multi-cell networks in the uplink. We formulate the sum-rate maximization problem aiming to jointly optimize the IRS reflect beamformers, base station (BS) combiners, and user equipment (UE) transmit powers. In this optimization, we consider the scenario in which (i) channels are dynamic and (ii) only partial CSI is available at each BS; specifically, scalar effective channels of local UEs and some of the interfering UEs. In casting this as a sequential decision making problem, we propose a multi-agent deep reinforcement learning algorithm to solve it, where each BS acts as an independent agent in charge of tuning the local UEs transmit powers, the local IRS reflect beamformer, and its combiners. We introduce an efficient message passing scheme that requires limited information exchange among the neighboring BSs to cope with the non-stationarity caused by the coupling of actions taken by multiple BSs. Our numerical simulations show that our method obtains substantial improvement in average data rate compared to several baseline approaches, e.g., fixed UEs transmit power and maximum ratio combining.


Artificial General Intelligence: A technology with more Cons than Pros

#artificialintelligence

It is not every day that humans are exposed to questions like what will happen if technology exceeds the human thought process. Or what will happen if machines became conscious or start having conscience so that they can take decisions, equivalent to that of humans? However, scientists and researchers are looking out for an alternative solution that can perform tasks which the traditional artificial intelligence and its subsidiaries cannot. Termed as Artificial General Intelligence, this cutting edge technology has been acknowledged by scientists and researchers, since the inception of artificial intelligence. Artificial general intelligence will be the technology that pairs its general intelligence with deep reinforcement learning.


Off-policy vs On-Policy vs Offline Reinforcement Learning Demystified!

#artificialintelligence

In this article, we will try to understand where On-Policy learning, Off-policy learning and offline learning algorithms fundamentally differ. Though there is a fair amount of intimidating jargon in reinforcement learning theory, these are just based on simple ideas. Reinforcement Learning is a subfield of machine learning that teaches an agent how to choose an action from its action space. It interacts with an environment, in order to maximize rewards over time. Complex enough? let's break this definition for better understanding.


Bayesian Robust Optimization for Imitation Learning

arXiv.org Machine Learning

One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms. Code is available at https://github.com/dsbrown1331/broil.


Artificial Intelligence for Robotics with NLP using Python

#artificialintelligence

Artificial Intelligence for Robotics with NLP using Python Artificial Intelligence (AI) for Unmanned Ground Vehicle (UGV) with Natural Language Processing using Python Masterclass Highest Rated What you'll learn Description This Course Cover Topics such as Python Basic Concepts, Python Advance Concepts, Numpy Library, Unmanned Ground Vehicle (UGV), Artificial Intelligence (AI), Machine Learning and Types, Unsupervised Learning, Reinforcement Learning, Speech Recognition and Natural Language Processing (NLP) This is best course for any one who wants to start career in Artificial Intelligence.


A Few Shot Adaptation of Visual Navigation Skills to New Observations using Meta-Learning

arXiv.org Artificial Intelligence

Target-driven visual navigation is a challenging problem that requires a robot to find the goal using only visual inputs. Many researchers have demonstrated promising results using deep reinforcement learning (deep RL) on various robotic platforms, but typical end-to-end learning is known for its poor extrapolation capability to new scenarios. Therefore, learning a navigation policy for a new robot with a new sensor configuration or a new target still remains a challenging problem. In this paper, we introduce a learning algorithm that enables rapid adaptation to new sensor configurations or target objects with a few shots. We design a policy architecture with latent features between perception and inference networks and quickly adapt the perception network via meta-learning while freezing the inference network. Our experiments show that our algorithm adapts the learned navigation policy with only three shots for unseen situations with different sensor configurations or different target colors. We also analyze the proposed algorithm by investigating various hyperparameters.


The Value Equivalence Principle for Model-Based Reinforcement Learning

arXiv.org Artificial Intelligence

Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning. As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates. We propose a formulation of the model learning problem based on the value equivalence principle and analyze how the set of feasible solutions is impacted by the choice of policies and functions. Specifically, we show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks, until eventually collapsing to a single point corresponding to a model that perfectly describes the environment. In many problems, directly modelling state-to-state transitions may be both difficult and unnecessary. By leveraging the value-equivalence principle one may find simpler models without compromising performance, saving computation and memory. We illustrate the benefits of value-equivalent model learning with experiments comparing it against more traditional counterparts like maximum likelihood estimation. More generally, we argue that the principle of value equivalence underlies a number of recent empirical successes in RL, such as Value Iteration Networks, the Predictron, Value Prediction Networks, TreeQN, and MuZero, and provides a first theoretical underpinning of those results.


Reinforcement learning is supervised learning on optimized data

AIHub

The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the optimization perspective, whereas methods that employ TD-learning or Q-learning are dynamic programming methods. While these methods have shown considerable success in recent years, these methods are still quite challenging to apply to new problems. In contrast deep supervised learning has been extremely successful and we may hence ask: Can we use supervised learning to perform RL? In this blog post we discuss a mental model for RL, based on the idea that RL can be viewed as doing supervised learning on the "good data".


Reinforcement Learning with Augmented Data

arXiv.org Machine Learning

Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks.