Reinforcement Learning
Temporal-related Convolutional-Restricted-Boltzmann-Machine capable of learning relational order via reinforcement learning procedure?
In this article, we extend the conventional framework of convolutional-Restricted-Boltzmann-Machine to learn highly abstract features among abitrary number of time related input maps by constructing a layer of multiplicative units, which capture the relations among inputs. In many cases, more than two maps are strongly related, so it is wise to make multiplicative unit learn relations among more input maps, in other words, to find the optimal relational-order of each unit. In order to enable our machine to learn relational order, we developed a reinforcement-learning method whose optimality is proven to train the network.
Tesla's new AI guru will help its cars learn for themselves
Elon Musk has hired a new director of AI research at Tesla, and it may signal a plan to rethink the way its automated driving works. This week, Musk poached Andrej Karpathy, an expert on vision, deep learning, and reinforcement learning, from OpenAI, a nonprofit that Musk and others are funding that's dedicated to "discovering and enacting the path to safe artificial general intelligence." Karpathy, who will apparently report directly to Musk, is a rising star in the world of AI, having studied at Stanford with Fei-Fei Li, a leading AI expert who is now the chief scientist of Google Cloud. Li is famous in tech circles for having developed a data set of images that helped inspire a breakthrough in machine vision. Many have pointed to Karpathy's expertise in computer vision as a key asset for Tesla, and that's true.
Market Interfaces for Electric Vehicle Charging
Stein, Sebastian, Gerding, Enrico H., Nedea, Adrian, Rosenfeld, Avi, Jennings, Nicholas R.
We consider settings where owners of electric vehicles (EVs) participate in a market mechanism to charge their vehicles. Existing work on such mechanisms has typically assumed that participants are fully rational and can report their preferences accurately via some interface to the mechanism or to a software agent participating on their behalf. However, this may not be reasonable in settings with non-expert human end-users.Thus, our overarching aim in this paper is to determine experimentally if a fully expressive market interface that enables accurate preference reports is suitable for the EV charging domain, or, alternatively, if a simpler, restricted interface that reduces the space of possible options is preferable. In doing this, we measure the performance of an interface both in terms of how it helps participants maximise their utility and how it affects deliberation time. Our secondary objective is to contrast two different types of restricted interfaces that vary in how they restrict the space of preferences that can be reported. To enable this analysis, we develop a novel game that replicates key features of an abstract EV charging scenario. In two experiments with over 300 users, we show that restricting the users' preferences significantly reduces the time they spend deliberating (by up to half in some cases). An extensive usability survey confirms that this restriction is furthermore associated with a lower perceived cognitive burden on the users. More surprisingly, at the same time, using restricted interfaces leads to an increase in the users' performance compared to the fully expressive interface (by up to 70%). We also show that some restricted interfaces have the desirable effect of reducing the energy consumption of their users by up to 20% while achieving the same utility as other interfaces. Finally, we find that a reinforcement learning agent displays similar performance trends to human users, enabling a novel methodology for evaluating market interfaces.
A Signaling Game Approach to Databases Querying and Interaction
McCamish, Ben, Termehchy, Arash, Touri, Behrouz
As most database users cannot precisely express their information needs, it is challenging for database management systems to understand them. We propose a novel formal framework for representing and understanding information needs in database querying and exploration. Our framework considers querying as a collaboration between the user and the database management system to establish a it mutual language for representing information needs. We formalize this collaboration as a signaling game, where each mutual language is an equilibrium for the game. A query interface is more effective if it establishes a less ambiguous mutual language faster. We discuss some equilibria, strategies, and the convergence in this game. In particular, we propose a reinforcement learning mechanism and analyze it within our framework. We prove that this adaptation mechanism for the query interface improves the effectiveness of answering queries stochastically speaking, and converges almost surely. We extend out results for the cases that the user also modifies her strategy during the interaction.
Statistical Mechanics of Node-perturbation Learning with Noisy Baseline
Hara, Kazuyuki, Katahira, Kentaro, Okada, Masato
Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is called a baseline. Cho et al. proposed node-perturbation learning with a noisy baseline. In this paper, we report on building the statistical mechanics of Cho's model and on deriving coupled differential equations of order parameters that depict learning dynamics. We also show how to derive the generalization error by solving the differential equations of order parameters. On the basis of the results, we show that Cho's results are also apply in general cases and show some general performances of Cho's model.
Classifying Options for Deep Reinforcement Learning
Arulkumaran, Kai, Dilokthanakul, Nat, Shanahan, Murray, Bharath, Anil Anthony
In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across a range of network capacities. We empirically show that our augmented DQN has lower sample complexity when simultaneously learning subtasks with negative transfer, without degrading performance when learning subtasks with positive transfer.
Reinforcement Learning in Rich-Observation MDPs using Spectral Methods
Azizzadenesheli, Kamyar, Lazaric, Alessandro, Anandkumar, Animashree
Designing effective exploration-exploitation algorithms in Markov decision processes (MDPs) with large state-action spaces is the main challenge in reinforcement learning (RL). In fact, the learning performance degrades with the number of states and actions in the MDP. However, MDPs often exhibit a low-dimensional latent structure in practice, where a small hidden state is observable through a possibly large number of observations. In this paper, we study the setting of rich-observation Markov decision processes (\richmdp), where hidden states are mapped to observations through an injective mapping, so that an observation can be generated by only one hidden state. While this mapping is unknown a priori, we introduce a spectral decomposition method that consistently estimates how observations are clustered in the hidden states. The estimated clustering is then integrated into an optimistic algorithm for RL (UCRL), which operates on the smaller clustered space. The resulting algorithm proceeds through phases and we show that its per-step regret (i.e., the difference in cumulative reward between the algorithm and the optimal policy) decreases as more observations are clustered together and finally, matches the (ideal) performance of an RL algorithm running directly on the hidden MDP.
Dex: Incremental Learning for Complex Environments in Deep Reinforcement Learning
This paper introduces Dex, a reinforcement learning environment toolkit specialized for training and evaluation of continual learning methods as well as general reinforcement learning problems. We also present the novel continual learning method of incremental learning, where a challenging environment is solved using optimal weight initialization learned from first solving a similar easier environment. We show that incremental learning can produce vastly superior results than standard methods by providing a strong baseline method across ten Dex environments. We finally develop a saliency method for qualitative analysis of reinforcement learning, which shows the impact incremental learning has on network attention.
Intelligent Bits: 16 June 2017
Facebook fighting extremism with AI -- "The problem, as usual, is determining what is extremist, and what isn't, and it goes further than just jihadists," he said. "Are they just talking about ISIS and Al Qaeda, or are they going to go further to deal with white nationalism and neo-Nazi movements?" AI is big business -- Element AI raises a whopping $102 million to bridge the gap between the haves and have-nots of AI. "Intuitive physics" -- DeepMind claims progress towards AI with a better sense of context and "intuitive physics" via relational reasoning and visual prediction, but obstacles to human-like intelligence remain. Alternative schema -- While deep reinforcement learning (DRL) is all the rage right now, some organizations like Vicarious have taken alternative approaches such as their Schema Networks, which have outperformed some DRL nets albeit with some debate and controversy. Facebook fighting extremism with AI -- "The problem, as usual, is determining what is extremist, and what isn't, and it goes further than just jihadists," he said.