Goto

Collaborating Authors

 Reinforcement Learning


Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

arXiv.org Machine Learning

The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value. We use this insight to propose a GAN-based approach to DiRL, which leverages the strengths of GANs in learning distributions of high-dimensional data. In particular, we show that our GAN approach can be used for DiRL with multivariate rewards, an important setting which cannot be tackled with prior methods. The multivariate setting also allows us to unify learning the distribution of values and state transitions, and we exploit this idea to devise a novel exploration method that is driven by the discrepancy in estimating both values and states.


An Efficient Deep Reinforcement Learning Model for Urban Traffic Control

arXiv.org Machine Learning

Urban Traffic Control (UTC) plays an essential role in Intelligent Transportation System (ITS) but remains difficult. Since model-based UTC methods may not accurately describe the complex nature of traffic dynamics in all situations, model-free data-driven UTC methods, especially reinforcement learning (RL) based UTC methods, received increasing interests in the last decade. However, existing DL approaches did not propose an efficient algorithm to solve the complicated multiple intersections control problems whose state-action spaces are vast. To solve this problem, we propose a Deep Reinforcement Learning (DRL) algorithm that combines several tricks to master an appropriate control strategy within an acceptable time. This new algorithm relaxes the fixed traffic demand pattern assumption and reduces human invention in parameter tuning. Simulation experiments have shown that our method outperforms traditional rule-based approaches and has the potential to handle more complex traffic problems in the real world.


Regret Bounds for Reinforcement Learning via Markov Chain Concentration

arXiv.org Machine Learning

We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic MDPs with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.


Learning to Share and Hide Intentions using Information Regularization

arXiv.org Machine Learning

Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agent to reveal or hide their intentions using an information-theoretic regularizer. We consider both the mutual information between goal and action given state, as well as the mutual information between goal and state. We show how to stochastically optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning. Finally, we demonstrate that cooperative (competitive) policies learned with our approach lead to more (less) reward for a second agent in two simple asymmetric information games.



Policy Networks vs Value Networks in Reinforcement Learning

#artificialintelligence

In Reinforcement Learning, the agents take random decisions in their environment and learns on selecting the right one out of many to achieve their goal and play at a super-human level. Policy and Value Networks are used together in algorithms like Monte Carlo Tree Search to perform Reinforcement Learning. Both the networks are an integral part of a method called Exploration in MCTS algorithm. They are also known as policy iteration & value iteration since they are calculated many times making it an iterative process. Let's understand why are they so important in Machine Learning and what's the difference between them?


Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems

arXiv.org Machine Learning

In recent years there has been a sharp rise in networking applications, in which significant events need to be classified but only a few training instances are available. These are known as cases of one-shot learning. Examples include analyzing network traffic under zero-day attacks, and computer vision tasks by sensor networks deployed in the field. To handle this challenging task, organizations often use human analysts to classify events under high uncertainty. Existing algorithms use a threshold-based mechanism to decide whether to classify an object automatically or send it to an analyst for deeper inspection. However, this approach leads to a significant waste of resources since it does not take the practical temporal constraints of system resources into account. Our contribution is threefold. First, we develop a novel Deep Reinforcement One-shot Learning (DeROL) framework to address this challenge. The basic idea of the DeROL algorithm is to train a deep-Q network to obtain a policy which is oblivious to the unseen classes in the testing data. Then, in real-time, DeROL maps the current state of the one-shot learning process to operational actions based on the trained deep-Q network, to maximize the objective function. Second, we develop the first open-source software for practical artificially intelligent one-shot classification systems with limited resources for the benefit of researchers in related fields. Third, we present an extensive experimental study using the OMNIGLOT dataset for computer vision tasks and the UNSW-NB15 dataset for intrusion detection tasks that demonstrates the versatility and efficiency of the DeROL framework.


Getting started in AI: 2018 โ€“ UX Planet

#artificialintelligence

Although artificial intelligence has been around since the 1950s yet it is definitely not late to get startedโ€ฆno matter if you are a developer or an enterprise manager. Actually, given the (major) developments in the past few years, you couldn't be in a better time to get started in Artificial Intelligence (AI). AI is redefining the experiences humans have with machines and enhancing even richer experiences for end users and entities alike. This entry is a follow-up to the talk I delivered during the Google Cloud Day in Malta on the 26th July 2018. Below, I will first set the context and then outline a selection of latest developments that should motivate you to get started in artificial intelligence.


An intro to Advantage Actor Critic methods: let's play Sonic the Hedgehog!

#artificialintelligence

As we saw in the article about improvements in Deep Q Learning, value-based methods have high variability. To reduce this problem, we spoke about using the advantage function instead of the value function. This function will tell us the improvement compared to the average the action taken at that state is. In other words, this function calculates the extra reward I get if I take this action. The extra reward is that beyond the expected value of that state.


AI-Equipped Robots Develop Situational Awareness in Earth's Most Uncertain Environment

#artificialintelligence

Algorithms created at Stevens Institute of Technology in New Jersey can teach robots to adapt to changing conditions related to protecting and preserving underwater infrastructure. Researchers at Stevens Institute of Technology in New Jersey have created algorithms to teach robots to adapt to changing conditions related to protecting and preserving underwater infrastructure. Stevens' Brendan Englot leads a group that uses reinforcement learning algorithms trained on sonar data. The group's robots emit high-frequency chirps and measure how long it takes the sound to return after reflecting off surrounding structures, gathering data and acquiring situational awareness while various forces buffet them. The research team recently dispatched a robot to autonomously map a Manhattan pier without a prior model.