Instructional Material
Online Reinforcement Learning in Stochastic Games
Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen
We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the \textsc{UCSG} algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the \textit{diameter}, which is an intrinsic value related to the mixing property of SGs. Slightly extended, \textsc{UCSG} finds an $\varepsilon$-maximin stationary policy with a sample complexity of $\tilde{\mathcal{O}}\left(\text{poly}(1/\varepsilon)\right)$, where $\varepsilon$ is the error parameter. To the best of our knowledge, this extended result is the first in the average-reward setting. In the analysis, we develop Markov chain's perturbation bounds for mean first passage times and techniques to deal with non-stationary opponents, which may be of interest in their own right.
Neural Expectation Maximization
Greff, Klaus, Steenkiste, Sjoerd van, Schmidhuber, Jürgen
Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities. We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the learned representations are useful for next-step prediction.
Population Matching Discrepancy and Applications in Deep Learning
Chen, Jianfei, LI, Chongxuan, Ru, Yizhong, Zhu, Jun
A differentiable estimation of the distance between two distributions based on samples is important for many deep learning tasks. One such estimation is maximum mean discrepancy (MMD). However, MMD suffers from its sensitive kernel bandwidth hyper-parameter, weak gradients, and large mini-batch size when used as a training objective. In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective. PMD is defined as the minimum weight matching of sample populations from each distribution, and we prove that PMD is a strongly consistent estimator of the first Wasserstein metric. We apply PMD to two deep learning tasks, domain adaptation and generative modeling. Empirical results demonstrate that PMD overcomes the aforementioned drawbacks of MMD, and outperforms MMD on both tasks in terms of the performance as well as the convergence speed.
Efficient Second-Order Online Kernel Learning with Adaptive Embedding
Calandriello, Daniele, Lazaric, Alessandro, Valko, Michal
Online kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate $\O(\sqrt{T})$ more loss than the optimal function, but the curse of kernelization results in a $\O(t)$ per step complexity. Second-order methods get closer to the optimum much faster, suffering only $\O(\log(T))$ regret, but second-order updates are even more expensive, with a $\O(t^2)$ per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding. Nonetheless, as long as the size of the approximation space or the number of SV does not grow over time, an adversary can always exploit the approximation process. In this paper, we propose PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space. The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$. Moreover, the second-order updated allows us to achieve the logarithmic regret. We empirically compare our algorithm on recent large-scales benchmarks and show it performs favorably.
Few-Shot Learning Through an Information Retrieval Lens
Triantafillou, Eleni, Zemel, Richard, Urtasun, Raquel
Few-shot learning refers to understanding new concepts from only a few examples. We propose an information retrieval-inspired approach for this problem that is motivated by the increased importance of maximally leveraging all the available information in this low-data regime. We define a training objective that aims to extract as much information as possible from each training batch by effectively optimizing over all relative orderings of the batch points simultaneously. In particular, we view each batch point as a `query' that ranks the remaining ones based on its predicted relevance to them and we define a model within the framework of structured prediction to optimize mean Average Precision over these rankings. Our method achieves impressive results on the standard few-shot classification benchmarks while is also capable of few-shot retrieval.
The Numerics of GANs
Mescheder, Lars, Nowozin, Sebastian, Geiger, Andreas
In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.
Subset Selection and Summarization in Sequential Data
Elhamifar, Ehsan, Kaluza, M. Clara De Paolis
Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives.
15 Minute Guide to Choose Effective Courses for Machine Learning and Data Science
Bill Gates proclaimed in a recent graduation ceremony, that artificial intelligence (AI), energy, and bio science are three most exciting and rewarding career choices today's young college graduates can choose from. I have come to believe strongly that some of the most important questions of our generation - related to sustainability, energy generation and distribution, transportation, access to basic amenities of life etc., are dependent on how intelligently we can mix the the first two branches of knowledge Mr. Gates mentions. In other words, the world of physical electronics (semiconductor industry comprises a central portion of that world), must do more to embrace fully the fruits of information technology and new developments in AI or data science. I wanted to learn, but where to start? I am a semiconductor professional with 8 years of post-PhD experience in a top technology company.
Google quietly debuts Chatbase, a chatbot analytics platform
Just ahead of Google's I/O event kicking off later today, some details have leaked out about a new service that Google is launching for developers that are building chatbots. TechCrunch has learned that Google has developed a new service called Chatbase, which provides analytics and suggestions for how to "fix" bot experiences to make them better for users. A source tells us that the service is being hatched out of Google's Area 120 internal incubator, and it was developed by Ofer Ronen and Hari Rajagopalan, both of whom came to Google by way of previous acquisitions. Chatbase appears to have been in a private beta with about a dozen companies, including the chat app Viber (which launched its own foray into chatbots late last year), eBay, JustFab and Unicef. There are no screenshots on the Chatbase website, although there are options for developers to sign up for access, which leads me to think it might be one of the items getting announced today.
Complete Guide to TensorFlow for Deep Learning with Python
Learn how to use Google's Deep Learning Framework – TensorFlow with Python! Welcome to the Complete Guide to TensorFlow for Deep Learning with Python! This course will guide you through how to use Google's TensorFlow framework to create artificial neural networks for deep learning! This course aims to give you an easy to understand guide to the complexities of Google's TensorFlow framework in a way that is easy to understand. Other courses and tutorials have tended to stay away from pure tensorflow and instead use abstractions that give the user less control.