Markov Models
Exploiting locality in high-dimensional factorial hidden Markov models
Rimella, Lorenzo, Whiteley, Nick
We propose algorithms for approximate filtering and smoothing in high-dimensional factorial hidden Markov models. The approximation involves discarding, in a principled way, likelihood factors according a notion of locality in a factor graph associated with the emission distribution. This allows the exponential-in-dimension cost of exact filtering and smoothing to be avoided. We prove that the approximation accuracy, measured in a local total variation norm, is `dimension-free' in the sense that as the overall dimension of the model increases the error bounds we derive do not necessarily degrade. A key step in the analysis is to quantify the error introduced by localizing the likelihood function in a Bayes' rule update. The factorial structure of the likelihood function which we exploit arises naturally when data have known spatial or network structure. We demonstrate the new algorithms on synthetic examples and a London Underground passenger flow problem, where the factor graph is effectively given by the train network.
Causally Driven Incremental Multi Touch Attribution Using a Recurrent Neural Network
Du, Ruihuan, Zhong, Yu, Nair, Harikesh, Cui, Bo, Shou, Ruyang
This paper describes a practical system for Multi Touch Attribution (MTA) for use by a publisher of digital ads. We developed this system for JD.com, an eCommerce company, which is also a publisher of digital ads in China. The approach has two steps. The first step ('response modeling') fits a user-level model for purchase of a product as a function of the user's exposure to ads. The second ('credit allocation') uses the fitted model to allocate the incremental part of the observed purchase due to advertising, to the ads the user is exposed to over the previous T days. To implement step one, we train a Recurrent Neural Network (RNN) on user-level conversion and exposure data. The RNN has the advantage of flexibly handling the sequential dependence in the data in a semi-parametric way. The specific RNN formulation we implement captures the impact of advertising intensity, timing, competition, and user-heterogeneity, which are known to be relevant to ad-response. To implement step two, we compute Shapley Values, which have the advantage of having axiomatic foundations and satisfying fairness considerations. The specific formulation of the Shapley Value we implement respects incrementality by allocating the overall incremental improvement in conversion to the exposed ads, while handling the sequence-dependence of exposures on the observed outcomes. The system is under production at JD.com, and scales to handle the high dimensionality of the problem on the platform (attribution of the orders of about 300M users, for roughly 160K brands, across 200+ ad-types, served about 80B ad-impressions over a typical 15-day period).
Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning
Choudhury, Shushman, Kochenderfer, Mykel J.
We introduce the problem of Dynamic Real-time Multimodal Routing (DREAMR), which requires planning and executing routes under uncertainty for an autonomous agent. The agent has access to a time-varying transit vehicle network in which it can use multiple modes of transportation. For instance, a drone can either fly or ride on terrain vehicles for segments of their routes. DREAMR is a difficult problem of sequential decision making under uncertainty with both discrete and continuous variables. We design a novel hierarchical hybrid planning framework to solve the DREAMR problem that exploits its structural decomposability. Our framework consists of a global open-loop planning layer that invokes and monitors a local closed-loop execution layer. Additional abstractions allow efficient and seamless interleaving of planning and execution. We create a large-scale simulation for DREAMR problems, with each scenario having hundreds of transportation routes and thousands of connection points. Our algorithmic framework significantly outperforms a receding horizon control baseline, in terms of elapsed time to reach the destination and energy expended by the agent.
Neural Network for NILM Based on Operational State Change Classification
Energy disaggregation in a non-intrusive way estimates appliance level electricity consumption from a single meter that measures the whole house electricity demand. Recently, with the ongoing increment of energy data, there are many data-driven deep learning architectures being applied to solve the non-intrusive energy disaggregation problem. However, most proposed methods try to estimate the on-off state or the power consumption of appliance, which need not only large amount of parameters, but also hyper-parameter optimization prior to training and even preprocessing of energy data for a specified appliance. In this paper, instead of estimating on-off state or power consumption, we adapt a neural network to estimate the operational state change of appliance. Our proposed solution is more feasible across various appliances and lower complexity comparing to previous methods. The simulated experiments in the low sample rate dataset REDD show the competitive performance of the designed method, with respect to other two benchmark methods, Hidden Markov Model-based and Graph Signal processing-based approaches.
Re-examination of the Role of Latent Variables in Sequence Modeling
Dai, Zihang, Lai, Guokun, Yang, Yiming, Yoo, Shinjae
With latent variables, stochastic recurrent models have achieved state-of-the-art performance in modeling sound-wave sequence. However, opposite results are also observed in other domains, where standard recurrent networks often outperform stochastic models. To better understand this discrepancy, we re-examine the roles of latent variables in stochastic recurrent models for speech density estimation. Our analysis reveals that under the restriction of fully factorized output distribution in previous evaluations, the stochastic models were implicitly leveraging intra-step correlation but the standard recurrent baselines were prohibited to do so, resulting in an unfair comparison. To correct the unfairness, we remove such restriction in our re-examination, where all the models can explicitly leverage intra-step correlation with an auto-regressive structure. Over a diverse set of sequential data, including human speech, MIDI music, handwriting trajectory and frame-permuted speech, our results show that stochastic recurrent models fail to exhibit any practical advantage despite the claimed theoretical superiority. In contrast, standard recurrent models equipped with an auto-regressive output distribution consistently perform better, significantly advancing the state-of-the-art results on three speech datasets.
Is There an Analog of Nesterov Acceleration for MCMC?
Ma, Yi-An, Chatterji, Niladri, Cheng, Xiang, Flammarion, Nicolas, Bartlett, Peter, Jordan, Michael I.
While optimization methodology has provided much of the underlying algorithmic machinery that has driven the theory and practice of machine learning in recent years, sampling-based methodology, in particular Markov chain Monte Carlo (MCMC), remains of critical importance, given its role in linking algorithms to statistical inference and, in particular, its ability to provide notions of confidence that are lacking in optimization-based methodology. However, the classical theory of MCMC is largely asymptotic and the theory has not developed as rapidly in recent years as the theory of optimization. Recently, however, a literature has emerged that derives nonasymptotic rates for MCMC algorithms [see, e.g., 9, 12, 10, 8, 6, 14, 21, 22, 2, 5]. This work has explicitly aimed at making use of ideas from optimization; in particular, whereas the classical literature on MCMC focused on reversible Markov chains, the recent literature has focused on nonreversible stochastic processes that are built on gradients [see, e.g., 18, 20, 3, 1]. In particular, the gradient-based Langevin algorithm [33, 32, 13] has been shown to be a form of gradient descent on the space of probabilities [see, e.g., 36]. What has not yet emerged is an analog of acceleration. Recall that the notion of acceleration has played a key role in gradient-based optimization methods [26]. In particular, the Nesterov accelerated gradient descent (AGD) method, an instance of the general family of "momentum methods," provably achieves faster convergence rate than gradient descent (GD) in a variety of settings [25]. Moreover, it achieves the optimal convergence rate under an oracle model of optimization complexity in the convex setting [24].
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Garcia, Francisco M., Thomas, Philip S.
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed approach.
Certified Reinforcement Learning with Logic Guidance
Hasanbeig, Mohammadhosein, Abate, Alessandro, Kroening, Daniel
This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for an unknown, and possibly continuous-state, Markov Decision Process (MDP), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape an adaptive reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with learning, i.e. the RL algorithm produces a policy that is certifiably safe with respect to the property. Under the assumption that the MDP has a finite number of states, theoretical guarantees are provided on the convergence of the RL algorithm. We also show that our method produces "best available" control policies when the logical property cannot be satisfied. Whenever the MDP has a continuous state space, we empirically show that our framework finds satisfying policies, if there exist such policies. Additionally, the proposed algorithm can handle time-varying periodic environments. The performance of the proposed architecture is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.
When Collaborative Filtering Meets Reinforcement Learning
In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent's optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets.
Belief dynamics extraction
Kumar, Arun, Wu, Zhengwei, Pitkow, Xaq, Schrater, Paul
Animal behavior is not driven simply by its current observations, but is strongly influenced by internal states. Estimating the structure of these internal states is crucial for understanding the neural basis of behavior. In principle, internal states can be estimated by inverting behavior models, as in inverse model-based Reinforcement Learning. However, this requires careful parameterization and risks model-mismatch to the animal. Here we take a data-driven approach to infer latent states directly from observations of behavior, using a partially observable switching semi-Markov process. This process has two elements critical for capturing animal behavior: it captures non-exponential distribution of times between observations, and transitions between latent states depend on the animal's actions, features that require more complex non-markovian models to represent. To demonstrate the utility of our approach, we apply it to the observations of a simulated optimal agent performing a foraging task, and find that latent dynamics extracted by the model has correspondences with the belief dynamics of the agent. Finally, we apply our model to identify latent states in the behaviors of monkey performing a foraging task, and find clusters of latent states that identify periods of time consistent with expectant waiting. This data-driven behavioral model will be valuable for inferring latent cognitive states, and thereby for measuring neural representations of those states.