AITopics

doi: 10.18653/v1/D19-1619

1908.10835

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

arXiv.org Artificial IntelligenceAug-28-2019

STMARL: A Spatio-Temporal Multi-Agent Reinforcement Learning Approach for Traffic Light Control

Wang, Yanan, Xu, Tong, Niu, Xin, Tan, Chang, Chen, Enhong, Xiong, Hui

The development of intelligent traffic light control systems is essential for smart transportation management. While some efforts have been made to optimize the use of individual traffic lights in an isolated way, related studies have largely ignored the fact that the use of multi-intersection traffic lights is spatially influenced and there is a temporal dependency of historical traffic status for current traffic light control. To that end, in this paper, we propose a novel SpatioTemporal Multi-Agent Reinforcement Learning (STMARL) framework for effectively capturing the spatio-temporal dependency of multiple related traffic lights and control these traffic lights in a coordinating way. Specifically, we first construct the traffic light adjacency graph based on the spatial structure among traffic lights. Then, historical traffic records will be integrated with current traffic status via Recurrent Neural Network structure. Moreover, based on the temporally-dependent traffic information, we design a Graph Neural Network based model to represent relationships among multiple traffic lights, and the decision for each traffic light will be made in a distributed way by the deep Q-learning method. Finally, the experimental results on both synthetic and real-world data have demonstrated the effectiveness of our STMARL framework, which also provides an insightful understanding of the influence mechanism among multi-intersection traffic lights.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1908.10577

Country: North America > United States (0.30)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceAug-27-2019, 15:37:21 GMT

DeepMind details OpenSpiel, a collection of AI training tools for video games

Reinforcement learning, the AI training technique that's brought to fruition systems capable of defeating world poker champions and guiding self-driving cars, isn't the simplest thing in the world to wrangle. That's particularly true in the gaming domain, where cutting-edge approaches sometimes require bespoke tools that aren't publicly available. In a paper recently published on the preprint server Arxiv.org, At its core, it's a collection of environments and algorithms for research in general reinforcement learning and search and planning in games, with tools to analyze learning dynamics and other common evaluation metrics. "The purpose of OpenSpiel is to promote general multiagent reinforcement learning across many different game types, in a similar way as general game-playing but with a heavy emphasis on learning and not in competition form," wrote the researchers.

deepmind detail openspiel, openspiel, reinforcement, (9 more...)

#artificialintelligence

Genre: Research Report (0.73)

Industry: Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.73)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Gomez, Cesar A., Wang, Xianbin, Shami, Abdallah

Intelligent Active Queue Management Using Explicit Congestion Notification

arXiv.org Machine LearningAug-27-2019

--As more end devices are getting connected, the Internet will become more congested. Various congestion control techniques have been developed either on transport or network layers. Active Queue Management (AQM) is a paradigm that aims to mitigate the congestion on the network layer through active buffer control to avoid overflow. However, finding the right parameters for an AQM scheme is challenging, due to the complexity and dynamics of the networks. On the other hand, the Explicit Congestion Notification (ECN) mechanism is a solution that makes visible incipient congestion on the network layer to the transport layer. In this work, we propose to exploit the ECN information to improve AQM algorithms by applying Machine Learning techniques. Our intelligent method uses an artificial neural network to predict congestion and an AQM parameter tuner based on reinforcement learning. The evaluation results show that our solution can enhance the performance of deployed AQM, using the existing TCP congestion control mechanisms. Thanks to the proliferation of smart devices and the paradigm of Internet of Things (IoT), the demand for connections to the Internet is dramatically growing.

congestion, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1909.08386

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Telecommunications (0.68)
Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Assefa, Beakal Gizachew, Ozkasap, Oznur

HyMER: A Hybrid Machine Learning Framework for Energy Efficient Routing in SDN

arXiv.org Machine LearningAug-27-2019

Combining the capabilities of the programmability of networks by SDN and discovering patterns by machine learning are utilized in security, traffic classification, QoS prediction, and network performance and has attracted the attention of researchers. In this work, we propose HyMER: a novel hybrid machine learning framework for traffic aware energy efficient routing in SDN which has supervised and reinforcement learning components. The supervised learning component consists of feature extraction, training, and testing. The reinforcement learning component learns from existing data or from scratch by iteratively interacting with the network environment. The framework is developed on POX controller and is evaluated on Mininet using Abiline, GEANT, and Nobel-Germany real-world topologies and dynamic traffic traces. Experimental results show that the supervised component achieves up to 70% feature size reduction and more than 80% accuracy in parameter prediction. The refine heuristics algorithm increases the accuracy of the prediction to 100% with 14X to 25X speedup as compared to the brute force method. The reinforcement learning module converges from 100 to 275 iterations and converges twice faster if applied on top of the supervised component. Moreover, HyMER achieves up to 10 watts per switch power saving, 30% link saving, 2 hops decrease in average path length.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1909.08074

Country: Europe > Germany (0.27)

Genre: Research Report > New Finding (0.66)

Industry:

Telecommunications (0.94)
Transportation (0.69)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Abbasi-Yadkori, Yasin, Lazic, Nevena, Szepesvari, Csaba, Weisz, Gellert

Exploration-Enhanced POLITEX

arXiv.org Machine LearningAug-27-2019

We study algorithms for average-cost reinforcement learning problems with value function approximation. Our starting point is the recently proposed POLITEX algorithm, a version of policy iteration where the policy produced in each iteration is near-optimal in hindsight for the sum of all past value function estimates. POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value estimation error can be controlled, which can be satisfied if all policies sufficiently explore the environment. Unfortunately, this assumption is often unrealistic. Motivated by the rapid growth of interest in developing policies that learn to explore their environment in the lack of rewards (also known as no-reward learning), we replace the previous assumption that all policies explore the environment with that a single, sufficiently exploring policy is available beforehand. The main contribution of the paper is the modification of POLITEX to incorporate such an exploration policy in a way that allows us to obtain a regret guarantee similar to the previous one but without requiring that all policies explore environment. In addition to the novel theoretical guarantees, we demonstrate the benefits of our scheme on environments which are difficult to explore using simple schemes like dithering. While the solution we obtain may not achieve the best possible regret, it is the first result that shows how to control the regret in the presence of function approximation errors on problems where exploration is nontrivial. Our approach can also be seen as a way of reducing the problem of minimizing the regret to learning a good exploration policy. We believe that modular approaches like ours can be highly beneficial in tackling harder control problems.

exploration policy, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1908.10479

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ensemble-Based Deep Reinforcement Learning for Chatbots

Cuayáhuitl, Heriberto, Lee, Donghyeon, Ryu, Seonghan, Cho, Yongjin, Choi, Sungja, Indurthi, Satish, Yu, Seunghak, Choi, Hyungtak, Hwang, Inchul, Kim, Jihie

Such an agent is typically characterised by: (i) a finite set of states 6 S {s i} that describe all possible situations in the environment; (ii) a finite set of actions A {a j} to change in the environment from one situation to another; (iii) a state transition function T (s,a,s null) that specifies the next state s null for having taken action a in the current state s; (iv) a reward function R (s,a,s null) that specifies a numerical value given to the agent for taking action a in state s and transitioning to state s null; and (v) a policy π: S A that defines a mapping from states to actions [2, 30]. The goal of a reinforcement learning agent is to find an optimal policy by maximising its cumulative discounted reward defined as Q (s,a) max π E[r t γr t 1 γ 2 r t 1 ... s t s,a t a,π ], where function Q represents the maximum sum of rewards r t discounted by factor γ at each time step. While a reinforcement learning agent takes actions with probability Pr ( a s) during training, it selects the best action at test time according to π (s) arg max a A Q (s,a). A deep reinforcement learning agent approximates Q using a multi-layer neural network [31]. The Q function is parameterised as Q(s,a; θ), where θ are the parameters or weights of the neural network (recurrent neural network in our case). Estimating these weights requires a dataset of learning experiences D {e 1,...e N} (also referred to as'experience replay memory'), where every experience is described as a tuple e t ( s t,a t,r t,s t 1). Inducing a Q function consists in applying Q-learning updates over minibatches of experience MB {( s,a,r,s null) U (D)} drawn uniformly at random from the full dataset D . This process is implemented in learning algorithms using Deep Q-Networks (DQN) such as those described in [31, 32, 33], and the following section describes a DQN-based algorithm for human-chatbot interaction.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

doi: 10.1016/j.neucom.2019.08.007

1908.10422

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.47)
Leisure & Entertainment (0.47)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Cuayáhuitl, Heriberto, Lee, Donghyeon, Ryu, Seonghan, Choi, Sungja, Hwang, Inchul, Kim, Jihie

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such problems using clustered actions instead of infinite actions, and a simple but promising reward function based on human-likeness scores derived from human-human dialogue data. We train Deep Reinforcement Learning (DRL) agents using chitchat data in raw text---without any manual annotations. Experimental results using different splits of training data report the following. First, that our agents learn reasonable policies in the environments they get familiarised with, but their performance drops substantially when they are exposed to a test set of unseen dialogues. Second, that the choice of sentence embedding size between 100 and 300 dimensions is not significantly different on test data. Third, that our proposed human-likeness rewards are reasonable for training chatbots as long as they use lengthy dialogue histories of >=10 sentences.

machine learning, natural language, reinforcement learning, (15 more...)

1908.10331

Country: Asia > South Korea (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Gerken, Andreas, Spranger, Michael

Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) for learning multi-goal, continuous action and state space controllers

Continuous V alue Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) for learning multi-goal, continuous action and state space controllers Andreas Gerken and Michael Spranger Sony Computer Science Laboratories Inc., Tokyo, Japan Abstract -- This paper presents a novel model-free Reinforcement Learning algorithm for learning behavior in continuous action, state, and goal spaces. The algorithm approximates optimal value functions using nonparametric estimators. It is able to efficiently learn to reach multiple arbitrary goals in deterministic and nondeterministic environments. T o improve generalization in the goal space, we propose a novel sample augmentation technique. Using these methods, robots learn faster and overall better controllers. We benchmark the proposed algorithms using simulation and a real-world voltage controlled robot that learns to maneuver in a non-observable Cartesian task space. I NTRODUCTION Learning to control one's body is a crucial skill for any embodied agent. A common way of framing the problem of learning to control an agent is Reinforcement Learning (RL). RL poses the problem in terms of actions that an agent can perform, observed states of the world and some reward function that pays out a treat or punishes the agent depending on its performance. The aim of an optimal RL controller is to maximize the collected rewards. Reinforcement Learning has been studied widely and applied to different domains of learning and control.

cvi, machine learning, reinforcement learning, (14 more...)

doi: 10.1109/ICRA.2019.8794347

1908.10255

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.54)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

Hu, Zhencai, Gao, Peng, Wang, Fei

Unmanned aircraft systems can perform some more dangerous and difficult missions than manned aircraft systems. In some highly complicated and changeable tasks, such as air combat, the maneuvering decision mechanism is required to sense the combat situation accurately and make the optimal strategy in real-time. This paper presents a formulation of a 3-D one-on-one air combat maneuvering problem and an approximate dynamic programming approach for computing an optimal policy on autonomous maneuvering decision making. The aircraft learns combat strategies in a Reinforcement Leaning method, while sensing the environment, taking available maneuvering actions and getting feedback reward signals. To solve the problem of dimensional explosion in the air combat, the proposed method is implemented through feature selection, trajectory sampling, function approximation and Bellman backup operation in the air combat simulation environment. This approximate dynamic programming approach provides a fast response to a rapidly changing tactical situation, learns in long-term planning, without any explicitly coded air combat rule base.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1908.1001

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Industry: Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)