AITopics

The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value. We use this insight to propose a GAN-based approach to DiRL, which leverages the strengths of GANs in learning distributions of high-dimensional data. In particular, we show that our GAN approach can be used for DiRL with multivariate rewards, an important setting which cannot be tackled with prior methods. The multivariate setting also allows us to unify learning the distribution of values and state transitions, and we exploit this idea to devise a novel exploration method that is driven by the discrepancy in estimating both values and states.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1808.0196

Country:

North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

An Efficient Deep Reinforcement Learning Model for Urban Traffic Control

Lin, Yilun, Dai, Xingyuan, Li, Li, Wang, Fei-Yue

Urban Traffic Control (UTC) plays an essential role in Intelligent Transportation System (ITS) but remains difficult. Since model-based UTC methods may not accurately describe the complex nature of traffic dynamics in all situations, model-free data-driven UTC methods, especially reinforcement learning (RL) based UTC methods, received increasing interests in the last decade. However, existing DL approaches did not propose an efficient algorithm to solve the complicated multiple intersections control problems whose state-action spaces are vast. To solve this problem, we propose a Deep Reinforcement Learning (DRL) algorithm that combines several tricks to master an appropriate control strategy within an acceptable time. This new algorithm relaxes the fixed traffic demand pattern assumption and reduces human invention in parameter tuning. Simulation experiments have shown that our method outperforms traditional rule-based approaches and has the potential to handle more complex traffic problems in the real world.

controller, machine learning, reinforcement learning, (14 more...)

1808.01876

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Shandong Province > Qingdao (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.93)
Transportation > Infrastructure & Services (0.69)
Transportation > Ground > Road (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Ortner, Ronald

We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic MDPs with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1808.01813

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria > Styria > Leoben (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Strouse, DJ, Kleiman-Weiner, Max, Tenenbaum, Josh, Botvinick, Matt, Schwab, David

Learning to Share and Hide Intentions using Information Regularization

Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agent to reveal or hide their intentions using an information-theoretic regularizer. We consider both the mutual information between goal and action given state, as well as the mutual information between goal and state. We show how to stochastically optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning. Finally, we demonstrate that cooperative (competitive) policies learned with our approach lead to more (less) reward for a second agent in two simple asymmetric information games.

information, machine learning, reinforcement learning, (17 more...)

1808.02093

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceAug-5-2018, 11:06:15 GMT

Policy Networks vs Value Networks in Reinforcement Learning

In Reinforcement Learning, the agents take random decisions in their environment and learns on selecting the right one out of many to achieve their goal and play at a super-human level. Policy and Value Networks are used together in algorithms like Monte Carlo Tree Search to perform Reinforcement Learning. Both the networks are an integral part of a method called Exploration in MCTS algorithm. They are also known as policy iteration & value iteration since they are calculated many times making it an iterative process. Let's understand why are they so important in Machine Learning and what's the difference between them?

artificial intelligence, machine learning, reinforcement learning, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.61)

Puzanov, Anton, Cohen, Kobi

Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems

arXiv.org Machine LearningAug-4-2018

In recent years there has been a sharp rise in networking applications, in which significant events need to be classified but only a few training instances are available. These are known as cases of one-shot learning. Examples include analyzing network traffic under zero-day attacks, and computer vision tasks by sensor networks deployed in the field. To handle this challenging task, organizations often use human analysts to classify events under high uncertainty. Existing algorithms use a threshold-based mechanism to decide whether to classify an object automatically or send it to an analyst for deeper inspection. However, this approach leads to a significant waste of resources since it does not take the practical temporal constraints of system resources into account. Our contribution is threefold. First, we develop a novel Deep Reinforcement One-shot Learning (DeROL) framework to address this challenge. The basic idea of the DeROL algorithm is to train a deep-Q network to obtain a policy which is oblivious to the unseen classes in the testing data. Then, in real-time, DeROL maps the current state of the one-shot learning process to operational actions based on the trained deep-Q network, to maximize the objective function. Second, we develop the first open-source software for practical artificially intelligent one-shot classification systems with limited resources for the benefit of researchers in related fields. Third, we present an extensive experimental study using the OMNIGLOT dataset for computer vision tasks and the UNSW-NB15 dataset for intrusion detection tasks that demonstrates the versatility and efficiency of the DeROL framework.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1808.01527

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceAug-3-2018, 07:40:49 GMT

Getting started in AI: 2018 – UX Planet

Although artificial intelligence has been around since the 1950s yet it is definitely not late to get started…no matter if you are a developer or an enterprise manager. Actually, given the (major) developments in the past few years, you couldn't be in a better time to get started in Artificial Intelligence (AI). AI is redefining the experiences humans have with machines and enhancing even richer experiences for end users and entities alike. This entry is a follow-up to the talk I delivered during the Google Cloud Day in Malta on the 26th July 2018. Below, I will first set the context and then outline a selection of latest developments that should motivate you to get started in artificial intelligence.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: Europe > Middle East > Malta (0.25)

Industry: Information Technology > Services (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

#artificialintelligenceAug-2-2018, 19:51:39 GMT

An intro to Advantage Actor Critic methods: let's play Sonic the Hedgehog!

As we saw in the article about improvements in Deep Q Learning, value-based methods have high variability. To reduce this problem, we spoke about using the advantage function instead of the value function. This function will tell us the improvement compared to the average the action taken at that state is. In other words, this function calculates the extra reward I get if I take this action. The extra reward is that beyond the expected value of that state.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Industry: Leisure & Entertainment > Games > Computer Games (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

#artificialintelligenceAug-2-2018, 16:31:23 GMT

AI-Equipped Robots Develop Situational Awareness in Earth's Most Uncertain Environment

Algorithms created at Stevens Institute of Technology in New Jersey can teach robots to adapt to changing conditions related to protecting and preserving underwater infrastructure. Researchers at Stevens Institute of Technology in New Jersey have created algorithms to teach robots to adapt to changing conditions related to protecting and preserving underwater infrastructure. Stevens' Brendan Englot leads a group that uses reinforcement learning algorithms trained on sonar data. The group's robots emit high-frequency chirps and measure how long it takes the sound to return after reflecting off surrounding structures, gathering data and acquiring situational awareness while various forces buffet them. The research team recently dispatched a robot to autonomously map a Manhattan pier without a prior model.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Country:

North America > United States > New Jersey (0.54)
North America > United States > Maryland > Montgomery County > Bethesda (0.09)

Industry: Government > Military (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)