Instructional Material
HOUDINI: Lifelong Learning as Program Synthesis
Valkov, Lazar, Chaudhari, Dipak, Srivastava, Akash, Sutton, Charles, Chaudhuri, Swarat
We present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning. Reusing high-level concepts across domains and learning complex procedures are key challenges in lifelong learning. We show that a program synthesis approach that combines gradient descent with combinatorial search over programs can be a more effective response to these challenges than purely neural methods. Our framework, called HOUDINI, represents neural networks as strongly typed, differentiable functional programs that use symbolic higher-order combinators to compose a library of neural functions. Our learning algorithm consists of: (1) a symbolic program synthesizer that performs a type-directed search over parameterized programs, and decides on the library functions to reuse, and the architectures to combine them, while learning a sequence of tasks; and (2) a neural module that trains these programs using stochastic gradient descent.
Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks
Kalra, Agastya, Rashwan, Abdullah, Hsu, Wei-Shou, Poupart, Pascal, Doshi, Prashant, Trimponias, Georgios
Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes a new online structure learning technique for feed-forward and recurrent SPNs. The algorithm is demonstrated on real-world datasets with continuous features for which it is not clear what network architecture might be best, including sequence datasets of varying length.
Online Improper Learning with an Approximation Oracle
Hazan, Elad, Hu, Wei, Li, Yuanzhi, Li, Zhiyuan
We study the following question: given an efficient approximation algorithm for an optimization problem, can we learn efficiently in the same setting? We give a formal affirmative answer to this question in the form of a reduction from online learning to offline approximate optimization using an efficient algorithm that guarantees near optimal regret. The algorithm is efficient in terms of the number of oracle calls to a given approximation oracle โ it makes only logarithmically many such calls per iteration. Furthermore, our result applies to the more general improper learning problems. Papers published at the Neural Information Processing Systems Conference.
Online Reinforcement Learning in Stochastic Games
Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen
We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the \textsc{UCSG} algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the \textit{diameter}, which is an intrinsic value related to the mixing property of SGs.
Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning
Efroni, Yonathan, Dalal, Gal, Scherrer, Bruno, Mannor, Shie
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work (Efroni et al., 2018), multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.
Lifelong Learning with Weighted Majority Votes
Pentina, Anastasia, Urner, Ruth
Better understanding of the potential benefits of information transfer and representation learning is an important step towards the goal of building intelligent systems that are able to persist in the world and learn over time. In this work, we consider a setting where the learner encounters a stream of tasks but is able to retain only limited information from each encountered task, such as a learned predictor. In contrast to most previous works analyzing this scenario, we do not make any distributional assumptions on the task generating process. Instead, we formulate a complexity measure that captures the diversity of the observed tasks. We provide a lifelong learning algorithm with error guarantees for every observed task (rather than on average).
Lifelong Learning with Non-i.i.d. Tasks
Pentina, Anastasia, Lampert, Christoph H.
In this work we aim at extending theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that the tasks are sampled i.i.d. Instead we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. In the first case we prove a PAC-Bayesian theorem, which can be seen as a direct generalization of the analogous previous result for the i.i.d. For the second scenario we propose to learn an inductive bias in form of a transfer procedure.
Virtual Class Enhanced Discriminative Embedding Learning
Chen, Binghui, Deng, Weihong, Shen, Haifeng
Recently, learning discriminative features to improve the recognition performances gradually becomes the primary goal of deep learning, and numerous remarkable works have emerged. In this paper, we propose a novel yet extremely simple method Virtual Softmax to enhance the discriminative property of learned features by injecting a dynamic virtual negative class into the original softmax. Injecting virtual class aims to enlarge inter-class margin and compress intra-class distribution by strengthening the decision boundary constraint. Although it seems weird to optimize with this additional virtual class, we show that our method derives from an intuitive and clear motivation, and it indeed encourages the features to be more compact and separable. This paper empirically and experimentally demonstrates the superiority of Virtual Softmax, improving the performances on a variety of object classification and face verification tasks.
Stochastic Online Greedy Learning with Semi-bandit Feedbacks
Lin, Tian, Li, Jian, Chen, Wei
The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and $\epsilon$-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve $O(\log T)$ problem-dependent regret bound ($T$ being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions.
Top 5 Free Courses to learn Machine Learning and Deep Learning in 2020
If you don't know, Keras is a both powerful and easy-to-use Python library for developing and evaluating deep learning models. It wraps the efficient numerical computation libraries like Theano and TensorFlow and allows you to define and train neural network models in a few short lines of code, which is just awesome. In this course, you will learn how to build an end-to-end Python machine learning project using Keras and tune a deep learning model and neural network. The best part of this course is that n the course, we will walk through every line of code so you'll be able to understand the model and the process.