Edmonton
Deep Reinforcement Learning with Decorrelation
Mavrin, Borislav, Yao, Hengshuai, Kong, Linglong
Learning an effective representation for high-dimensional data is a challenging problem in reinforcement learning (RL). Deep reinforcement learning (DRL) such as Deep Q networks (DQN) achieves remarkable success in computer games by learning deeply encoded representation from convolution networks. In this paper, we propose a simple yet very effective method for representation learning with DRL algorithms. Our key insight is that features learned by DRL algorithms are highly correlated, which interferes with learning. By adding a regularized loss that penalizes correlation in latent features (with only slight computation), we decorrelate features represented by deep neural networks incrementally. On 49 Atari games, with the same regularization factor, our decorrelation algorithms perform $70\%$ in terms of human-normalized scores, which is $40\%$ better than DQN. In particular, ours performs better than DQN on 39 games with 4 close ties and lost only slightly on $6$ games. Empirical results also show that the decorrelation method applies to Quantile Regression DQN (QR-DQN) and significantly boosts performance. Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.
An Exponential Efron-Stein Inequality for Lq Stable Learning Rules
Abou-Moustafa, Karim, Szepesvari, Csaba
There is accumulating evidence in the literature that stability of learning algorithms is a key characteristic that permits a learning algorithm to generalize. Despite various insightful results in this direction, there seems to be an overlooked dichotomy in the type of stability-based generalization bounds we have in the literature. On one hand, the literature seems to suggest that exponential generalization bounds for the estimated risk, which are optimal, can be only obtained through stringent, distribution independent and computationally intractable notions of stability such as uniform stability. On the other hand, it seems that weaker notions of stability such as hypothesis stability, although it is distribution dependent and more amenable to computation, can only yield polynomial generalization bounds for the estimated risk, which are suboptimal. In this paper, we address the gap between these two regimes of results. In particular, the main question we address here is whether it is possible to derive exponential generalization bounds for the estimated risk using a notion of stability that is computationally tractable and distribution dependent, but weaker than uniform stability. Using recent advances in concentration inequalities, and using a notion of stability that is weaker than uniform stability but distribution dependent and amenable to computation, we derive an exponential tail bound for the concentration of the estimated risk of a hypothesis returned by a general learning rule, where the estimated risk is expressed in terms of either the resubstitution estimate (empirical error), or the deleted (or, leave-one-out) estimate. As an illustration, we derive exponential tail bounds for ridge regression with unbounded responses -- a setting where uniform stability results of Bousquet and Elisseeff (2002) are not applicable.
DeepMind is opening a huge new London headquarters in 2020
DeepMind, the Alphabet-owned artificial intelligence company, will be moving into a new flagship building in 2020. While a precise date is yet to be determined, the company hopes that the new headquarters โ which will feature a double helix staircase descending through a library, a roof garden, lecture theatre and lobby artwork by creatives working with data and artificial intelligence โ will be operational in the first half of next year. DeepMind is currently located near to the new site in London's Kings Cross, where it has two floors in the Google building. It also has smaller, satellite offices in Paris, Edmonton, Alberta and Mountain View, California. The new, 11-storey DeepMind headquarters will cement the reputation of the Kings Cross area as London's so-called'knowledge quarter': as well as Google, Facebook has taken office space nearby, as has Samsung.
Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning
Kearney, Alex, Veeriah, Vivek, Travnik, Jaden, Pilarski, Patrick M., Sutton, Richard S.
There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural networks. In particular, we focus on stochastic meta-descent introduced in the Incremental Delta-Bar-Delta (IDBD) algorithm for setting individual step sizes for each feature of a linear function approximator. Using IDBD, a feature with large or small step sizes will have a large or small impact on generalization from training examples. As a main contribution of this work, we extend IDBD to temporal-difference (TD) learning---a form of learning which is effective in sequential, non i.i.d. problems. We derive a variety of IDBD generalizations for TD learning, demonstrating that they are able to distinguish which features are relevant and which are not. We demonstrate that TD IDBD is effective at learning feature relevance in both an idealized gridworld and a real-world robotic prediction task.
MinAtar: An Atari-inspired Testbed for More Efficient Reinforcement Learning Experiments
The Arcade Learning Environment (ALE) is a popular platform for evaluating reinforcement learning agents. Much of the appeal comes from the fact that Atari games are varied, showcase aspects of competency we expect from an intelligent agent, and are not biased towards any particular solution approach. The challenge of the ALE includes 1) the representation learning problem of extracting pertinent information from the raw pixels, and 2) the behavioural learning problem of leveraging complex, delayed associations between actions and rewards. Often, in reinforcement learning research, we care more about the latter, but the representation learning problem adds significant computational expense. In response, we introduce MinAtar, short for miniature Atari, a new evaluation platform that captures the general mechanics of specific Atari games, while simplifying certain aspects. In particular, we reduce the representational complexity to focus more on behavioural challenges. MinAtar consists of analogues to five Atari games which play out on a 10x10 grid. MinAtar provides a 10x10xn state representation. The n channels correspond to game-specific objects, such as ball, paddle and brick in the game Breakout. While significantly simplified, these domains are still rich enough to allow for interesting behaviours. To demonstrate the challenges posed by these domains, we evaluated a smaller version of the DQN architecture. We also tried variants of DQN without experience replay, and without a target network, to assess the impact of those two prominent components in the MinAtar environments. In addition, we evaluated a simpler agent that used actor-critic with eligibility traces, online updating, and no experience replay. We hope that by introducing a set of simplified, Atari-like games we can allow researchers to more efficiently investigate the unique behavioural challenges provided by the ALE.
Efficient Reinforcement Learning with a Mind-Game for Full-Length StarCraft II
Liu, Ruo-Ze, Guo, Haifeng, Ji, Xiaozhong, Yu, Yang, Xiao, Zitai, Wu, Yuzhou, Pang, Zhen-Jia, Lu, Tong
StarCraft II provides an extremely challenging platform for reinforcement learning due to its huge state-space and game length. The previous fastest method requires days to train a full-length game policy in a single commercial machine. In this paper, we introduce the mind-game to facilitate the reinforcement learning, which is an abstract task model. With the mind-game, the policy is firstly trained in the mind-game fastly and is then mapped to the real game for the second phase training. In our experiments, the trained agent can achieve a 100% win-rate on the map Simple64 against the most difficult non-cheating built-in bot (level-7), and the training is 100 times faster than the previous ones under the same computational resource. To test the generalization performance of the agent, a Golden level of StarCraft II Ladder human player has competed with the agent. With restricted strategy, the agent wins the human player by 4 out of 5 games. The mind-game approach might shed some light for further studies of efficient reinforcement learning. The codes are publicly available (https://github.com/mindgameSC2/mind-SC2).
Intelligent Autonomous Things on the Battlefield
Numerous, artificially intelligent, networked things will populate the battlefield of the future, operating in close collaboration with human warfighters, and fighting as teams in highly adversarial environments. This chapter explores the characteristics, capabilities and intelli-gence required of such a network of intelligent things and humans - Internet of Battle Things (IOBT). The IOBT will experience unique challenges that are not yet well addressed by the current generation of AI and machine learning.
The Seven Tools of Causal Inference, with Reflections on Machine Learning
The dramatic success in machine learning has led to an explosion of artificial intelligence (AI) applications and increasing expectations for autonomous systems that exhibit human-level intelligence. These expectations have, however, met with fundamental obstacles that cut across many application areas. One such obstacle is adaptability, or robustness. Machine learning researchers have noted current systems lack the ability to recognize or react to new circumstances they have not been specifically programmed or trained for. Intensive theoretical and experimental efforts toward "transfer learning," "domain adaptation," and "lifelong learning"4 are reflective of this obstacle. Another obstacle is "explainability," or that "machine learning models remain mostly black boxes"26 unable to explain the reasons behind their predictions or recommendations, thus eroding users' trust and impeding diagnosis and repair; see Hutson8 and Marcus.11 A third obstacle concerns the lack of understanding of cause-effect connections.
A Conjoint Application of Data Mining Techniques for Analysis of Global Terrorist Attacks -- Prevention and Prediction for Combating Terrorism
Kumar, Vivek, Mazzara, Manuel, Gen., Maj., Messina, Angelo, Lee, JooYoung
Terrorism has become one of the most tedious problems to deal with and a prominent threat to mankind. To enhance counter-terrorism, several research works are developing efficient and precise systems, data mining is not an exception. Immense data is floating in our lives, though the scarce availability of authentic terrorist attack data in the public domain makes it complicated to fight terrorism. This manuscript focuses on data mining classification techniques and discusses the role of United Nations in counter-terrorism. It analyzes the performance of classifiers such as Lazy Tree, Multilayer Perceptron, Multiclass and Na\"ive Bayes classifiers for observing the trends for terrorist attacks around the world. The database for experiment purpose is created from different public and open access sources for years 1970-2015 comprising of 156,772 reported attacks causing massive losses of lives and property. This work enumerates the losses occurred, trends in attack frequency and places more prone to it, by considering the attack responsibilities taken as evaluation class.
Level-0 Models for Predicting Human Behavior in Games
Wright, James R., Leyton-Brown, Kevin
Behavioral game theory seeks to describe the way actual people (as compared to idealized, "rational" agents) act in strategic situations. Our own recent work has identified iterative models, such as quantal cognitive hierarchy, as the state of the art for predicting human play in unrepeated, simultaneous-move games. Iterative models predict that agents reason iteratively about their opponents, building up from a specification of nonstrategic behavior called level-0. A modeler is in principle free to choose any description of level-0 behavior that makes sense for a given setting. However, in practice almost all existing work specifies this behavior as a uniform distribution over actions. In most games it is not plausible that even nonstrategic agents would choose an action uniformly at random, nor that other agents would expect them to do so. A more accurate model for level-0 behavior has the potential to dramatically improve predictions of human behavior, since a substantial fraction of agents may play level-0 strategies directly, and furthermore since iterative models ground all higher-level strategies in responses to the level-0 strategy. Our work considers models of the way in which level-0 agents construct a probability distribution over actions, given an arbitrary game. We considered a large space of alternatives and, in the end, recommend a model that achieved excellent performance across the board: a linear weighting of four binary features, each of which is general in the sense that it can be computed from any normal form game. Adding real-valued variants of the same four features yielded further improvements in performance, albeit with a corresponding increase in the number of parameters needing to be estimated. We evaluated the effects of combining these new level-0 models with several iterative models and observed large improvements in predictive accuracy.