AITopics

1807.0753

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJul-19-2018

Towards Explainable and Controllable Open Domain Dialogue Generation with Dialogue Acts

Xu, Can, Wu, Wei, Wu, Yu

We study open domain dialogue generation with dialogue acts designed to explain how people engage in social chat. To imitate human behavior, we propose managing the flow of human-machine interactions with the dialogue acts as policies. The policies and response generation are jointly learned from human-human conversations, and the former is further optimized with a reinforcement learning approach. With the dialogue acts, we achieve significant improvement over state-of-the-art methods on response quality for given contexts and dialogue length in both machine-machine simulation and human-machine conversation.

machine learning, natural language, reinforcement learning, (17 more...)

1807.07255

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > Singapore (0.05)
Asia > Malaysia (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

arXiv.org Artificial IntelligenceJul-18-2018

Backplay: "Man muss immer umkehren"

Resnick, Cinjon, Raileanu, Roberta, Kapoor, Sanyam, Peysakhovich, Alex, Cho, Kyunghyun, Bruna, Joan

A long-standing problem in model free reinforcement learning (RL) is that it requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to increase the sample efficiency of RL when we have access to demonstrations. Our approach, which we call Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. We perform experiments in a competitive four player game (Pommerman) and a path-finding maze game. We find that this weak form of guidance provides significant gains in sample complexity with a stark advantage in sparse reward environments. In some cases, standard RL did not yield any improvement while Backplay reached success rates greater than 50% and generalized to unseen initial conditions in the same amount of training time. Additionally, we see that agents trained via Backplay can learn policies superior to those of the original demonstration.

backplay, machine learning, reinforcement learning, (15 more...)

1807.06919

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Sanborn, Sophia, Bourgin, David D., Chang, Michael, Griffiths, Thomas L.

Representational efficiency outweighs action efficiency in human program induction

arXiv.org Artificial IntelligenceJul-18-2018

The importance of hierarchically structured representations for tractable planning has long been acknowledged. However, the questions of how people discover such abstractions and how to define a set of optimal abstractions remain open. This problem has been explored in cognitive science in the problem solving literature and in computer science in hierarchical reinforcement learning. Here, we emphasize an algorithmic perspective on learning hierarchical representations in which the objective is to efficiently encode the structure of the problem, or, equivalently, to learn an algorithm with minimal length. We introduce a novel problem-solving paradigm that links problem solving and program induction under the Markov Decision Process (MDP) framework. Using this task, we target the question of whether humans discover hierarchical solutions by maximizing efficiency in number of actions they generate or by minimizing the complexity of the resulting representation and find evidence for the primacy of representational efficiency.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1807.07134

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.56)

Dorfer, Matthias, Henkel, Florian, Widmer, Gerhard

Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game

Score following is the process of tracking a musical performance (audio) with respect to a known symbolic representation (a score). We start this paper by formulating score following as a multimodal Markov Decision Process, the mathematical foundation for sequential decision making. Given this formal definition, we address the score following task with state-of-the-art deep reinforcement learning (RL) algorithms such as synchronous advantage actor critic (A2C). In particular, we design multimodal RL agents that simultaneously learn to listen to music, read the scores from images of sheet music, and follow the audio along in the sheet, in an end-to-end fashion. All this behavior is learned entirely from scratch, based on a weak and potentially delayed reward signal that indicates to the agent how close it is to the correct position in the score. Besides discussing the theoretical advantages of this learning paradigm, we show in experiments that it is in fact superior compared to previously proposed methods for score following in raw sheet music images.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1807.06391

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Hüttenrauch, Maximilian, Šošić, Adrian, Neumann, Gerhard

Deep Reinforcement Learning for Swarm Systems

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1807.06613

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Lincolnshire > Lincoln (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

De Giacomo, Giuseppe, Iocchi, Luca, Favorito, Marco, Patrizi, Fabio

Reinforcement Learning for LTLf/LDLf Goals

MDPs extended with LTLf/LDLf non-Markovian rewards have recently attracted interest as a way to specify rewards declaratively. In this paper, we discuss how a reinforcement learning agent can learn policies fulfilling LTLf/LDLf goals. In particular we focus on the case where we have two separate representations of the world: one for the agent, using the (predefined, possibly low-level) features available to it, and one for the goal, expressed in terms of high-level (human-understandable) fluents. We formally define the problem and show how it can be solved. Moreover, we provide experimental evidence that keeping the RL agent feature space separated from the goal's can work in practice, showing interesting cases where the agent can indeed learn a policy that fulfills the LTLf/LDLf goal using only its features (augmented with additional memory).

artificial intelligence, machine learning, reinforcement learning, (19 more...)

1807.06333

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Russia (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Kumar, Arun, Paul, Navneet, Omkar, S N

Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG). DDPG is one of the algorithms for learning controls in continuous action spaces. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern. The video presenting our experiment is available at https://goo.gl/NHXKqR.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

1807.05924

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.60)

Jansen, Nils, Könighofer, Bettina, Junges, Sebastian, Bloem, Roderick

Shielded Decision-Making in MDPs

arXiv.org Artificial IntelligenceJul-16-2018

Roderick Bloem TU Graz Austria A prominent problem in artificial intelligence and machine learning is the safe exploration of an environment. In particular, reinforcement learning is a wellknown technique to determine optimal policies for complicated dynamic systems, but suffers from the fact that such policies may induce harmful behavior. We present the concept of a shield that forces decision-making to provably adhere to safety requirements with high probability. Our method exploits the inherent uncertainties in scenarios given by Markov decision processes. We present a method to compute probabilities of decision making regarding temporal logic constraints. We use that information to realize a shield that--when applied to a reinforcement learning algorithm--ensures (near-)optimal behavior both for the safety constraints and for the actual learning objective. In our experiments, we show on the arcade game PAC-MAN that the learning efficiency increases as the learning needs orders of magnitude fewer episodes. We show tradeoffs between sufficient progress in exploration of the environment and ensuring strict safety.

logic & formal reasoning, machine learning, reinforcement learning, (18 more...)

1807.06096

Country:

Europe > Austria > Styria > Graz (0.24)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (0.58)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Torabi, Faraz, Warnell, Garrett, Stone, Peter

Generative Adversarial Imitation from Observation

arXiv.org Artificial IntelligenceJul-16-2018

Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions. The lack of action information both distinguishes IfO from most of the literature in imitation learning, and also sets it apart as a method that may enable agents to learn from large set of previously inapplicable resources such as internet videos. In this paper, we propose both a general framework for IfO approaches and propose a new IfO approach based on generative adversarial networks called generative adversarial imitation from observation (GAIfO). We demonstrate that this approach performs comparably to classical imitation learning approaches (which have access to the demonstrator's actions) and significantly outperforms existing imitation from observation methods in high-dimensional simulation environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1807.06158

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)