Undirected Networks
Abstracting Probabilistic Models
Abstraction is a powerful idea widely used in science, to model, reason and explain the behavior of systems in a more tractable search space, by omitting irrelevant details. While notions of abstraction have matured for deterministic systems, the case for abstracting probabilistic models is not yet fully understood. In this paper, we provide a semantical framework for analyzing such abstractions from first principles. We develop the framework in a general way, allowing for expressive languages, including logic-based ones that admit relational and hierarchical constructs with stochastic primitives. We motivate a definition of consistency between a high-level model and its low-level counterpart, but also treat the case when the high-level model is missing critical information present in the low-level model. We prove properties of abstractions, both at the level of the parameter as well as the structure of the models. We conclude with some observations about how abstractions can be derived automatically.
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
Dai, Falcon Z., Walter, Matthew R.
We propose a new complexity measure for Markov decision processes (MDP), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.
Predicting e-commerce customer conversion from minimal temporal patterns on symbolized clickstream trajectories
Tagliabue, Jacopo, Lacasa, Lucas, Greco, Ciro, Pavoni, Mattia, Polonioli, Andrea
Knowing if a user is a "buyer" vs "window shopper" solely based on clickstream data is of crucial importance for e-commerce platforms seeking to implement real-time accurate NBA ("next best action") policies. However, due to the low frequency of conversion events and the noisiness of browsing data, classifying user sessions is very challenging. In this paper, we address the clickstream classification problem in the eCommerce industry and present three major contributions to the burgeoning field of A.I.-for-retail: first, we collected, normalized and prepared a novel dataset of live shopping sessions from a major European e-commerce website; second, we use the dataset to test in a controlled environment strong baselines and SOTA models from the literature; finally, we propose a new discriminative neural model that outperforms neural architectures recently proposed by [1] at Rakuten labs.
Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Generative Model
Kinose, Akira, Taniguchi, Tadahiro
Integration of reinforcement learning and imitation learning is an important problem that has been studied for a long time in the field of intelligent robotics. Reinforcement learning optimizes policies to maximize the cumulative reward, whereas imitation learning attempts to extract general knowledge about the trajectories demonstrated by experts, i.e., demonstrators. Because each of them has their own drawbacks, methods combining them and compensating for each set of drawbacks have been explored thus far. However, many of the methods are heuristic and do not have a solid theoretical basis. In this paper, we present a new theory for integrating reinforcement and imitation learning by extending the probabilistic generative model framework for reinforcement learning, {\it plan by inference}. We develop a new probabilistic graphical model for reinforcement learning with multiple types of rewards and a probabilistic graphical model for Markov decision processes with multiple optimality emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning method of reinforcement learning and imitation learning can be formulated as a probabilistic inference of policies on pMDP-MO by considering the output of the discriminator in generative adversarial imitation learning as an additional optimal emission observation. We adapt the generative adversarial imitation learning and task-achievement reward to our proposed framework, achieving significantly better performance than agents trained with reinforcement learning or imitation learning alone. Experiments demonstrate that our framework successfully integrates imitation and reinforcement learning even when the number of demonstrators is only a few.
Learning the Arrow of Time
Rahaman, Nasim, Wolf, Steffen, Goyal, Anirudh, Remme, Roman, Bengio, Yoshua
We humans seem to have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment. Drawing inspiration from that, we address the problem of learning an arrow of time in a Markov (Decision) Process. We illustrate how a learned arrow of time can capture meaningful information about the environment, which in turn can be used to measure reachability, detect side-effects and to obtain an intrinsic reward signal. We show empirical results on a selection of discrete and continuous environments, and demonstrate for a class of stochastic processes that the learned arrow of time agrees reasonably well with a known notion of an arrow of time given by the celebrated Jordan-Kinderlehrer-Otto result.
Visual analytics for team-based invasion sports with significant events and Markov reward process
Zhao, Kun, Osogami, Takayuki, Morimura, Tetsuro
In team-based invasion sports such as soccer and basketball, analytics is important for teams to understand their performance and for audiences to understand matches better. The present work focuses on performing visual analytics to evaluate the value of any kind of event occurring in a sports match with a continuous parameter space. Here, the continuous parameter space involves the time, location, score, and other parameters. Because the spatiotemporal data used in such analytics is a low-level representation and has a very large size, however, traditional analytics may need to discretize the continuous parameter space (e.g., subdivide the playing area) or use a local feature to limit the analysis to specific events (e.g., only shots). These approaches make evaluation impossible for any kind of event with a continuous parameter space. To solve this problem, we consider a whole match as a Markov chain of significant events, so that event values can be estimated with a continuous parameter space by solving the Markov chain with a machine learning model. The significant events are first extracted by considering the time-varying distribution of players to represent the whole match. Then, the extracted events are redefined as different states with the continuous parameter space and built as a Markov chain so that a Markov reward process can be applied. Finally, the Markov reward process is solved by a customized fitted-value iteration algorithm so that the event values with the continuous parameter space can be predicted by a regression model. As a result, the event values can be visually inspected over the whole playing field under arbitrary given conditions. Experimental results with real soccer data show the effectiveness of the proposed system.
LSTM-based Handwriting Recognition by Google
Handwriting is a one of the challenge in NLP task. It is because it can be various among different people. Sometimes, "O" can be written as "0" while human begin has the capability to distinguish whether it is "O" or "0" from contextualize information. For example, "0" will be used in phone number while "O" will be used as part of English word. Another skill is lexicon searching.
A Kernel Stein Test for Comparing Latent Variable Models
Kanagawa, Heishiro, Jitkrittum, Wittawat, Mackey, Lester, Fukumizu, Kenji, Gretton, Arthur
We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent variable models with intractable unnormalized densities. Our test generalises the kernel Stein discrepancy (KSD) tests of (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018, Jitkrittum et al., 2018) which required exact access to unnormalized densities. Our new test relies on the simple idea of using an approximate observed-variable marginal in place of the exact, intractable one. As our main theoretical contribution, we prove that the new test, with a properly corrected threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative maximum mean discrepancy test (Bounliphone et al., 2015) , which cannot exploit the latent structure.
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
Lee, Alex X., Nagabandi, Anusha, Abbeel, Pieter, Levine, Sergey
Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations. However, these kinds of observation spaces present a number of challenges in practice, since the policy must now solve two problems: a representation learning problem, and a task learning problem. In this paper, we aim to explicitly learn representations that can accelerate reinforcement learning from images. We propose the stochastic latent actor-critic (SLAC) algorithm: a sample-efficient and high-performing RL algorithm for learning policies for complex continuous control tasks directly from high-dimensional image inputs. SLAC learns a compact latent representation space using a stochastic sequential latent variable model, and then learns a critic model within this latent space. By learning a critic within a compact state space, SLAC can learn much more efficiently than standard RL methods. The proposed model improves performance substantially over alternative representations as well, such as variational autoencoders. In fact, our experimental evaluation demonstrates that the sample efficiency of our resulting method is comparable to that of model-based RL methods that directly use a similar type of model for control. Furthermore, our method outperforms both model-free and model-based alternatives in terms of final performance and sample efficiency, on a range of difficult image-based control tasks.
HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking
Goel, Rahul, Paul, Shachi, Hakkani-Tür, Dilek
Recent works on end-to-end trainable neural network based approaches have demonstrated state-of-the-art results on dialogue state tracking. The best performing approaches estimate a probability distribution over all possible slot values. However, these approaches do not scale for large value sets commonly present in real-life applications and are not ideal for tracking slot values that were not observed in the training set. To tackle these issues, candidate-generation-based approaches have been proposed. These approaches estimate a set of values that are possible at each turn based on the conversation history and/or language understanding outputs, and hence enable state tracking over unseen values and large value sets however, they fall short in terms of performance in comparison to the first group. In this work, we analyze the performance of these two alternative dialogue state tracking methods, and present a hybrid approach (HyST) which learns the appropriate method for each slot type. To demonstrate the effectiveness of HyST on a rich-set of slot types, we experiment with the recently released MultiWOZ-2.0 multi-domain, task-oriented dialogue-dataset. Our experiments show that HyST scales to multi-domain applications. Our best performing model results in a relative improvement of 24% and 10% over the previous SOTA and our best baseline respectively.