Undirected Networks
Neural Network for NILM Based on Operational State Change Classification
Energy disaggregation in a non-intrusive way estimates appliance level electricity consumption from a single meter that measures the whole house electricity demand. Recently, with the ongoing increment of energy data, there are many data-driven deep learning architectures being applied to solve the non-intrusive energy disaggregation problem. However, most proposed methods try to estimate the on-off state or the power consumption of appliance, which need not only large amount of parameters, but also hyper-parameter optimization prior to training and even preprocessing of energy data for a specified appliance. In this paper, instead of estimating on-off state or power consumption, we adapt a neural network to estimate the operational state change of appliance. Our proposed solution is more feasible across various appliances and lower complexity comparing to previous methods. The simulated experiments in the low sample rate dataset REDD show the competitive performance of the designed method, with respect to other two benchmark methods, Hidden Markov Model-based and Graph Signal processing-based approaches.
Re-examination of the Role of Latent Variables in Sequence Modeling
Dai, Zihang, Lai, Guokun, Yang, Yiming, Yoo, Shinjae
With latent variables, stochastic recurrent models have achieved state-of-the-art performance in modeling sound-wave sequence. However, opposite results are also observed in other domains, where standard recurrent networks often outperform stochastic models. To better understand this discrepancy, we re-examine the roles of latent variables in stochastic recurrent models for speech density estimation. Our analysis reveals that under the restriction of fully factorized output distribution in previous evaluations, the stochastic models were implicitly leveraging intra-step correlation but the standard recurrent baselines were prohibited to do so, resulting in an unfair comparison. To correct the unfairness, we remove such restriction in our re-examination, where all the models can explicitly leverage intra-step correlation with an auto-regressive structure. Over a diverse set of sequential data, including human speech, MIDI music, handwriting trajectory and frame-permuted speech, our results show that stochastic recurrent models fail to exhibit any practical advantage despite the claimed theoretical superiority. In contrast, standard recurrent models equipped with an auto-regressive output distribution consistently perform better, significantly advancing the state-of-the-art results on three speech datasets.
Is There an Analog of Nesterov Acceleration for MCMC?
Ma, Yi-An, Chatterji, Niladri, Cheng, Xiang, Flammarion, Nicolas, Bartlett, Peter, Jordan, Michael I.
While optimization methodology has provided much of the underlying algorithmic machinery that has driven the theory and practice of machine learning in recent years, sampling-based methodology, in particular Markov chain Monte Carlo (MCMC), remains of critical importance, given its role in linking algorithms to statistical inference and, in particular, its ability to provide notions of confidence that are lacking in optimization-based methodology. However, the classical theory of MCMC is largely asymptotic and the theory has not developed as rapidly in recent years as the theory of optimization. Recently, however, a literature has emerged that derives nonasymptotic rates for MCMC algorithms [see, e.g., 9, 12, 10, 8, 6, 14, 21, 22, 2, 5]. This work has explicitly aimed at making use of ideas from optimization; in particular, whereas the classical literature on MCMC focused on reversible Markov chains, the recent literature has focused on nonreversible stochastic processes that are built on gradients [see, e.g., 18, 20, 3, 1]. In particular, the gradient-based Langevin algorithm [33, 32, 13] has been shown to be a form of gradient descent on the space of probabilities [see, e.g., 36]. What has not yet emerged is an analog of acceleration. Recall that the notion of acceleration has played a key role in gradient-based optimization methods [26]. In particular, the Nesterov accelerated gradient descent (AGD) method, an instance of the general family of "momentum methods," provably achieves faster convergence rate than gradient descent (GD) in a variety of settings [25]. Moreover, it achieves the optimal convergence rate under an oracle model of optimization complexity in the convex setting [24].
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Garcia, Francisco M., Thomas, Philip S.
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed approach.
Certified Reinforcement Learning with Logic Guidance
Hasanbeig, Mohammadhosein, Abate, Alessandro, Kroening, Daniel
This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for an unknown, and possibly continuous-state, Markov Decision Process (MDP), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape an adaptive reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with learning, i.e. the RL algorithm produces a policy that is certifiably safe with respect to the property. Under the assumption that the MDP has a finite number of states, theoretical guarantees are provided on the convergence of the RL algorithm. We also show that our method produces "best available" control policies when the logical property cannot be satisfied. Whenever the MDP has a continuous state space, we empirically show that our framework finds satisfying policies, if there exist such policies. Additionally, the proposed algorithm can handle time-varying periodic environments. The performance of the proposed architecture is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.
When Collaborative Filtering Meets Reinforcement Learning
In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent's optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets.
Belief dynamics extraction
Kumar, Arun, Wu, Zhengwei, Pitkow, Xaq, Schrater, Paul
Animal behavior is not driven simply by its current observations, but is strongly influenced by internal states. Estimating the structure of these internal states is crucial for understanding the neural basis of behavior. In principle, internal states can be estimated by inverting behavior models, as in inverse model-based Reinforcement Learning. However, this requires careful parameterization and risks model-mismatch to the animal. Here we take a data-driven approach to infer latent states directly from observations of behavior, using a partially observable switching semi-Markov process. This process has two elements critical for capturing animal behavior: it captures non-exponential distribution of times between observations, and transitions between latent states depend on the animal's actions, features that require more complex non-markovian models to represent. To demonstrate the utility of our approach, we apply it to the observations of a simulated optimal agent performing a foraging task, and find that latent dynamics extracted by the model has correspondences with the belief dynamics of the agent. Finally, we apply our model to identify latent states in the behaviors of monkey performing a foraging task, and find clusters of latent states that identify periods of time consistent with expectant waiting. This data-driven behavioral model will be valuable for inferring latent cognitive states, and thereby for measuring neural representations of those states.
Non-asymptotic Analysis of Biased Stochastic Approximation Scheme
Karimi, Belhal, Miasojedow, Blazej, Moulines, Eric, Wai, Hoi-To
Stochastic approximation (SA) is a key method used in statistical learning. Recently, its non-asymptotic convergence analysis has been considered in many papers. However, most of the prior analyses are made under restrictive assumptions such as unbiased gradient estimates and convex objective function, which significantly limit their applications to sophisticated tasks such as online and reinforcement learning. These restrictions are all essentially relaxed in this work. In particular, we analyze a general SA scheme to minimize a non-convex, smooth objective function. We consider update procedure whose drift term depends on a state-dependent Markov chain and the mean field is not necessarily of gradient type, covering approximate second-order method and allowing asymptotic bias for the one-step updates. We illustrate these settings with the online EM algorithm and the policy-gradient method for average reward maximization in reinforcement learning.
Estimating the Mixing Time of Ergodic Markov Chains
Wolfer, Geoffrey, Kontorovich, Aryeh
We address the problem of estimating the mixing time $t_{\mathsf{mix}}$ of an arbitrary ergodic finite Markov chain from a single trajectory of length $m$. The reversible case was addressed by Hsu et al. [2017], who left the general case as an open problem. In the reversible case, the analysis is greatly facilitated by the fact that the Markov operator is self-adjoint, and Weyl's inequality allows for a dimension-free perturbation analysis of the empirical eigenvalues. As Hsu et al. point out, in the absence of reversibility (and hence, the non-symmetry of the pair probabilities matrix), the existing perturbation analysis has a worst-case exponential dependence on the number of states $d$. Furthermore, even if an eigenvalue perturbation analysis with better dependence on $d$ were available, in the non-reversible case the connection between the spectral gap and the mixing time is not nearly as straightforward as in the reversible case. Our key insight is to estimate the pseudo-spectral gap instead, which allows us to overcome the loss of self-adjointness and to achieve a polynomial dependence on $d$ and the minimal stationary probability $\pi_\star$. Additionally, in the reversible case, we obtain simultaneous nearly (up to logarithmic factors) minimax rates in $t_{\mathsf{mix}}$ and precision $\varepsilon$, closing a gap in Hsu et al., who treated $\varepsilon$ as constant in the lower bounds. Finally, we construct fully empirical confidence intervals for the pseudo-spectral gap, which shrink to zero at a rate of roughly $1/\sqrt m$, and improve the state of the art in even the reversible case.
Minimax Testing of Identity to a Reference Ergodic Markov Chain
Wolfer, Geoffrey, Kontorovich, Aryeh
We exhibit an efficient procedure for testing, based on a single long state sequence, whether an unknown Markov chain is identical to or $\varepsilon$-far from a given reference chain. We obtain nearly matching (up to logarithmic factors) upper and lower sample complexity bounds for our notion of distance, which is based on total variation. Perhaps surprisingly, we discover that the sample complexity depends solely on the properties of the known reference chain and does not involve the unknown chain at all, which is not even assumed to be ergodic.