AITopics

We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.

artificial intelligence, reinforcement learning, reward function, (18 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Genre: Research Report (0.66)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Durugkar, Ishan (University of Texas at Austin) | Stone, Peter (University of Texas at Austin)

Adversarial Goal Generation for Intrinsic Motivation

Generally in Reinforcement Learning the goal, or reward signal, is given by the environment and cannot be controlled by the agent. We propose to introduce an intrinsic motivation module that will select a reward function for the agent to learn to achieve. We will use a Universal Value Function Approximator, that takes as input both the state and the parameters of this reward function as the goal to predict the value function (or action-value function) to generalize across these goals. This module will be trained to generate goals such that the agent's learning is maximized. Thus, this is also a method for automatic curriculum learning.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas > Travis County > Austin (0.15)

Industry: Leisure & Entertainment > Sports (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhou, Kaiyang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) | Qiao, Yu (Queen Mary University of London) | Xiang, Tao (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)

Deep Reinforcement Learning for Unsupervised Video Summarization With Diversity-Representativeness Reward

Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos. In this paper, we formulate video summarization as a sequential decision-making process and develop a deep summarization network (DSN) to summarize videos. DSN predicts for each video frame a probability, which indicates how likely a frame is selected, and then takes actions based on the probability distributions to select frames, forming video summaries. To train our DSN, we propose an end-to-end, reinforcement learning-based framework, where we design a novel reward function that jointly accounts for diversity and representativeness of generated summaries and does not rely on labels or user interactions at all. During training, the reward function judges how diverse and representative the generated summaries are, while DSN strives for earning higher rewards by learning to produce more diverse and more representative summaries. Since labels are not required, our method can be fully unsupervised. Extensive experiments on two benchmark datasets show that our unsupervised method not only outperforms other state-of-the-art unsupervised methods, but also is comparable to or even superior than most of published supervised approaches.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Finite Sample Analyses for TD(0) With Function Approximation

Dalal, Gal (Technion, Israel Institute of Technology) | Szörényi, Balázs (Technion, Israel Institute of Technology ) | Thoppe, Gugan (Duke University) | Mannor, Shie (Technion, Israel Institute of Technology)

TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Existing convergence rates for Temporal Difference (TD) methods apply only to somewhat modified versions, e.g., projected variants or ones where stepsizes depend on unknown problem parameters. Our analyses obviate these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. The two are obtained via different approaches that use relatively unknown, recently developed stochastic approximation techniques.

lemma 5, stepsize, theorem 3, (16 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)

Mo, Kaixiang (Hong Kong University of Science and Technology) | Zhang, Yu (Hong Kong University of Science and Technology) | Li, Shuangyin (Hong Kong University of Science and Technology) | Li, Jiajun (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)

Personalizing a Dialogue System With Transfer Reinforcement Learning

It is difficult to train a personalized task-oriented dialogue system because the data collected from each individual is often insufficient. Personalized dialogue systems trained on a small dataset is likely to overfit and make it difficult to adapt to different user needs. One way to solve this problem is to consider a collection of multiple users as a source domain and an individual user as a target domain, and to perform transfer learning from the source domain to the target domain. By following this idea, we propose a PErsonalized Task-oriented diALogue (PETAL) system, a transfer reinforcement learning framework based on POMDP, to construct a personalized dialogue system. The PETAL system first learns common dialogue knowledge from the source domain and then adapts this knowledge to the target domain. The proposed PETAL system can avoid the negative transfer problem by considering differences between the source and target users in a personalized Q-function. Experimental results on a real-world coffee-shopping data and simulation data show that the proposed PETAL system can learn optimal policies for different users, and thus effectively improve the dialogue quality under the personalized setting.

dialogue system, target domain, target user, (15 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > United States (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Brown, Daniel S. (University of Texas at Austin) | Niekum, Scott (University of Texas at Austin)

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.

demonstration, evaluation policy, reward function, (13 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Cellular Network Traffic Scheduling With Deep Reinforcement Learning

Modern mobile networks are facing unprecedented growth in demand due to a new class of traffic from Internet of Things (IoT) devices such as smart wearables and autonomous cars. Future networks must schedule delay-tolerant software updates, data backup, and other transfers from IoT devices while maintaining strict service guarantees for conventional real-time applications such as voice-calling and video. This problem is extremely challenging because conventional traffic is highly dynamic across space and time, so its performance is significantly impacted if all IoT traffic is scheduled immediately when it originates. In this paper, we present a reinforcement learning (RL) based scheduler that can dynamically adapt to traffic variation, and to various reward functions set by network operators, to optimally schedule IoT traffic. Using 4 weeks of real network data from downtown Melbourne, Australia spanning diverse traffic patterns, we demonstrate that our RL scheduler can enable mobile networks to carry 14.7% more data with minimal impact on existing traffic, and outpeforms heuristic schedulers by more than 2x. Our work is a valuable step towards designing autonomous, "self-driving" networks that learn to manage themselves from past data.

forecast, iot traffic, traffic, (15 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Oceania > Australia > Victoria > Melbourne (0.24)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.46)

Industry:

Information Technology > Networks (0.87)
Telecommunications > Networks (0.69)
Transportation > Ground > Road (0.48)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence

We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents' optimal polices, but more importantly, the observation and understanding of individual agent's behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Industry: Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Learning Attention Model From Human for Visuomotor Tasks

Zhang, Luxin (Peking University) | Zhang, Ruohan (The University of Texas at Austin) | Liu, Zhuode (The University of Texas at Austin) | Hayhoe, Mary M. (The University of Texas at Austin) | Ballard, Dana H. (The University of Texas at Austin)

A wealth of information regarding intelligent decision making is conveyed by human gaze and visual attention, hence, modeling and exploiting such information might be a promising way to strengthen algorithms like deep reinforcement learning. We collect high-quality human action and gaze data while playing Atari games. Using these data, we train a deep neural network that can predict human gaze positions and visual attention with high accuracy.

machine learning, reinforcement learning, visual attention, (15 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas (0.16)
Asia > China (0.15)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

Comparing Reward Shaping, Visual Hints, and Curriculum Learning

Pocius, Rey (Oregon State University) | Isele, David (University of Pennsylvania) | Roberts, Mark (United States Naval Research Laboratory) | Aha, David W. (United States Naval Research Laboratory )

Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum. Yet few studies examine how they compare to each other, when one might prefer one approach, or how they may complement each other. As a first step in this direction, we compare reward shaping, hints, and curricula for a Deep RL agent in the game of Minecraft. We seek to answer whether reward shaping, visual hints, or the curricula have the most impact on performance, which we measure as the time to reach the target, the distance from the target, the cumulative reward, or the number of actions taken. Our analyses show that performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, the results suggest that designing an effective curriculum and providing appropriate hints most improve the performance. Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum, yet few studies examine how they compare to each other. We compare these approaches for a Deep RL agent in the game of Minecraft and show performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, this suggests that designing an effective curriculum with hints most improve the performance.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania (0.15)
North America > United States > Oregon (0.15)

Genre:

Research Report > Experimental Study (0.89)
Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)