AITopics

1910.14055

Country:

Asia > Vietnam > Hanoi > Hanoi (0.04)
North America > Canada (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Daulton, Samuel, Singh, Shaun, Avadhanula, Vashist, Dimmery, Drew, Bakshy, Eytan

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

arXiv.org Artificial IntelligenceNov-1-2019

Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with respect to a currently deployed policy. Many of the existing constraint-aware algorithms consider problems with a single objective (the reward) and a constraint on the reward with respect to a baseline policy. However, many important applications involve multiple competing objectives and auxiliary constraints. In this paper, we propose a novel Thompson sampling algorithm for multi-outcome contextual bandit problems with auxiliary constraints. We empirically evaluate our algorithm on a synthetic problem. Lastly, we apply our method to a real world video transcoding problem and provide a practical way for navigating the trade-off between safety and performance using Bayesian optimization.

algorithm, constraint, safety constraint, (12 more...)

1911.00638

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > Canada (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

John, Indu, Kamanchi, Chandramouli, Bhatnagar, Shalabh

Generalized Speedy Q-learning

arXiv.org Artificial IntelligenceNov-1-2019

In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role. It is possible to generalize the Bellman operator using the technique of successive relaxation. We use the generalized Bellman operator to derive a simple and efficient family of algorithms called Generalized Speedy Q-learning (GSQL-w) and analyze its finite time performance. We show that GSQL-w has an improved finite time performance bound compared to SQL for the case when the relaxation parameter w is greater than 1. This improvement is a consequence of the contraction factor of the generalized Bellman operator being less than that of the standard Bellman operator. Numerical experiments are provided to demonstrate the empirical performance of the GSQL-w algorithm.

algorithm, q-learning, speedy q-learning, (15 more...)

1911.00397

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-1-2019

Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning

Kobayashi, Kyoichiro, Horii, Takato, Iwaki, Ryo, Nagai, Yukie, Asada, Minoru

Generative adversarial imitation learning (GAIL) has attracted increasing attention in the field of robot learning. It enables robots to learn a policy to achieve a task demonstrated by an expert while simultaneously estimating the reward function behind the expert's behaviors. However, this framework is limited to learning a single task with a single reward function. This study proposes an extended framework called situated GAIL (S-GAIL), in which a task variable is introduced to both the discriminator and generator of the GAIL framework. The task variable has the roles of discriminating different contexts and making the framework learn different reward functions and policies for multiple tasks. To achieve the early convergence of learning and robustness during reward estimation, we introduce a term to adjust the entropy regularization coefficient in the generator's objective function. Our experiments using two setups (navigation in a discrete grid world and arm reaching in a continuous space) demonstrate that the proposed framework can acquire multiple reward functions and policies more effectively than existing frameworks. The task variable enables our framework to differentiate contexts while sharing common knowledge among multiple tasks.

generator, reward function, s-gail, (16 more...)

1911.00238

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceOct-31-2019, 11:39:27 GMT

An A.I. has beat humans at yet another of our own games

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged by consensus as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multiagent challenges. Over the course of a decade and numerous competitions 1–3, the best results have been made possible by hand-crafting major elements of the system, simplifying important aspects of the game, or using superhuman capabilities 4. Even with these modifications, no previous system has come close to rivalling the skill of top players in the full game. We chose to address the challenge of StarCraft using general purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counterstrategies, each represented by deep neural networks5,6. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.

agent, alphastar, gameplay, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.56)

Waytowich, Nicholas, Barton, Sean L., Lawhern, Vernon, Warnell, Garrett

A Narration-based Reward Shaping Approach using Grounded Natural Language Commands

arXiv.org Artificial IntelligenceOct-31-2019

While deep reinforcement learning techniques have led to agents that are successfully able to learn to perform a number of tasks that had been previously unlearnable, these techniques are still susceptible to the longstanding problem of reward sparsity. This is especially true for tasks such as training an agent to play StarCraft II, a real-time strategy game where reward is only given at the end of a game which is usually very long. While this problem can be addressed through reward shaping, such approaches typically require a human expert with specialized knowledge. Inspired by the vision of enabling reward shaping through the more-accessible paradigm of natural-language narration, we develop a technique that can provide the benefits of reward shaping using natural language commands. Our narration-guided RL agent projects sequences of natural-language commands into the same high-dimensional representation space as corresponding goal states. We show that we can get improved performance with our method compared to traditional reward-shaping approaches. Additionally, we demonstrate the ability of our method to generalize to unseen natural-language commands.

machine learning, natural language, reinforcement learning, (18 more...)

1911.00497

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Maryland (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Liu, Iou-Jen, Yeh, Raymond A., Schwing, Alexander G.

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

arXiv.org Machine LearningOct-31-2019

Single-agent deep reinforcement learning has achieved impressive performance in many domains, including playing Go [1, 2] and Atari games [3, 4]. However, many real world problems, such as traffic congestion reduction [5, 6], antenna tilt control [7], and dynamic resource allocation [8] are more naturally modeled as multi-agent systems. Unfortunately, directly deploying single-agent reinforcement learning to each agent in a multi-agent system does not result in satisfying performance [9, 10]. Particularly, in multi-agent reinforcement learning [8, 10-19], estimating the value function is challenging, because the environment is non-stationary from the perspective of an individual agent [10, 11]. To alleviate the issue, recently, multi-agent deep deterministic policy gradient (MADDPG) [10] proposed a centralized critic whose input is the concatenation of all agents' observations and actions.

agent, mlp critic, permutation invariant critic, (12 more...)

1911.00025

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.30)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Xu, Haitao, McCane, Brendan, Szymanski, Lech

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning

arXiv.org Machine LearningOct-31-2019

Exploration in environments with continuous control and sparse rewards remains a key challenge in reinforcement learning (RL). Recently, surprise has been used as an intrinsic reward that encourages systematic and efficient exploration. We introduce a new definition of surprise and its RL implementation named Variational Assorted Surprise Exploration (VASE). VASE uses a Bayesian neural network as a model of the environment dynamics and is trained using variational inference, alternately updating the accuracy of the agent's model and policy. Our experiments show that in continuous control sparse reward environments VASE outperforms other surprise-based exploration techniques.

agent, exploration, neural network, (11 more...)

1910.14351

Country:

Oceania > New Zealand > South Island > Otago > Dunedin (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

arXiv.org Machine LearningOct-31-2019

RLINK: Deep Reinforcement Learning for User Identity Linkage

Li, Xiaoxue, Cao, Yanan, Shang, Yanmin, Li, Yangxi, Liu, Yanbing, Tan, Jianlong

User identity linkage is a task of recognizing the identities of the same user across different social networks (SN). Previous works tackle this problem via estimating the pairwise similarity between identities from different SN, predicting the label of identity pairs or selecting the most relevant identity pair based on the similarity scores. However, most of these methods ignore the results of previously matched identities, which could contribute to the linkage in following matching steps. To address this problem, we convert user identity linkage into a sequence decision problem and propose a reinforcement learning model to optimize the linkage strategy from the global perspective. Our method makes full use of both the social network structure and the history matched identities, and explores the long-term influence of current matching on subsequent decisions. We conduct experiments on different types of datasets, the results show that our method achieves better performance than other state-of-the-art methods.

deep reinforcement learning, identity pair, reinforcement learning, (10 more...)

1910.14273

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Beijing > Beijing (0.05)
(8 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry: Information Technology > Services (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-31-2019

DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering

Heffetz, Yuval, Vainstein, Roman, Katz, Gilad, Rokach, Lior

Automatic machine learning (AutoML) is an area of research aimed at automating machine learning (ML) activities that currently require human experts. One of the most challenging tasks in this field is the automatic generation of end-to- end ML pipelines: combining multiple types of ML algorithms into a single architecture used for end-to-end analysis of previously-unseen data. This task has two challenging aspects: the first is the need to explore a large search space of algorithms and pipeline architectures. The second challenge is the computational cost of training and evaluating multiple pipelines. In this study we present DeepLine, a reinforcement learning based approach for automatic pipeline generation. Our proposed approach utilizes an efficient representation of the search space and leverages past knowledge gained from previously-analyzed datasets to make the problem more tractable. Additionally, we propose a novel hierarchical-actions algorithm that serves as a plugin, mediating the environment-agent interaction in deep reinforcement learning problems. The plugin significantly speeds up the training process of our model. Evaluation on 56 datasets shows that DeepLine outperforms state-of-the-art approaches both in accuracy and in computational cost.

algorithm, pipeline, representation, (16 more...)

1911.00061

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)