Markov Models
Coarse Models for Bird Migrations Using Clustering and Non-Stationary Markov Chains
Jain, Nitin (Georgia Institute of Technology) | Dilkina, Bistra (Georgia Institute of Technology)
While great strides have been made in collecting presence data and developing accurate species distribution models, much less is known about the migratory process that guides the spatio-temporal changes in distributions for migrating species, especially birds. In this work, we address a challenging inference task, where given only aggregate and noisy data of the volume of birds for each spatial pixel and time window, we predict the likely transition links with their associated probabilities. We propose a framework to build such migration networks for different bird species and present a real world example of constructing a network using our approach.
An Additive Autoregressive Hidden Markov Model for Energy Disaggregation
Early, Kirstin (Carnegie Mellon University) | Kolter, J. Zico (Carnegie Mellon University)
We motivate and develop an additive autoregressive hidden Markov model specifically designed to work on the task of energy disaggregation; that is, separating a whole-building electricity signal into its component device signals whose sum is the aggregate signal observed by a smart meter. This model assumes each device in the building operates as an individual autoregressive HMM, where hidden states represent the underlying power mode of the device and Gaussian emissions correspond to that device's power consumption. The additive property models the observed output (whole-building power signal) as the sum of the emissions of multiple hidden states (i.e., as the sum of individual consumptions of multiple devices in the building). The autoregressive property realistically models how many appliances consume energy and is a new extension to previous work using factorial HMMs for energy disaggregation. Finally, our model also includes a robust mixture component, via an L1-regularized noise term, that can absorb outliers arising in this setting from unknown or rarely-used devices. We extract the power signals and underlying state sequences of single devices in a stagewise fashion and illustrate the results of this process on the Reference Energy Disaggregation Dataset (REDD).
Adaptive Advice in Automobile Climate Control Systems
Rosenfeld, Ariel (Bar-Ilan University) | Azaria, Amos (Carnegie Mellon University) | Kraus, Sarit ( Bar-Ilan University ) | Goldman, Claudia V. (General Motors Advanced Technical Center) | Tsimhoni, Omer (General Motors Advanced Technical Center)
Reducing an automobile's energy consumption will lower its dependency on fossil fuel and extend the travel range of electric vehicles. Automobile Climate Control Systems (CCS) are known to be heavy energy consumers. To help reduce CCS energy consumption, this paper presents an adaptive automated agent, MDP Agent for Climate control Systems -- MACS, which provides drivers advice as to how to set their CCS. First, we present a model which has 78% accuracy in predicting drivers' reactions to different advice in different situations. Using the prediction model, we designed a Markov Decision Process which solution provided the advising policy for MACS. Through empirical evaluation using an electric car, with 83 human subjects, we show that MACS successfully reduced the energy consumption of the subjects by 33% compared to subjects who were not equipped with MACS. MACS also outperformed the state-of-the-art Social agent for Advice Provision (SAP).
Generative Modeling of Hidden Functional Brain Networks
Nandy, Shaurabh, Golden, Richard M.
Resting-state functional connectivity fMRI data is a derivative of the unobservable neuronal functional network structure of the human brain. This data is subject to multiple sources of noise such as thermal noise, system noise, and physiological noise. Commonly used methods to infer the latent network structure, such as thresholding methods, make the implicit assumption that weak links are not as important as strong links, and that links are conditionally independent. However, such assumptions provide an incomplete description of the biology. Additionally, despite a core set of observations about functional networks such as smallworldness, modularity, exponentially truncated degree distributions, and presence of various types of hubs, very little is known about the computational principles which can give rise to these observations. This paper presents a Hidden Markov Random Field framework for the purpose of representing, estimating, and evaluating latent neuronal functional relationships using fMRI data. The main theoretical contributions of this paper are summarized as follows.
The Complexity of Reasoning with FODD and GFODD
Hescott, Benjamin J., Khardon, Roni
Recent work introduced Generalized First Order Decision Diagrams (GFODD) as a knowledge representation that is useful in mechanizing decision theoretic planning in relational domains. GFODDs generalize function-free first order logic and include numerical values and numerical generalizations of existential and universal quantification. Previous work presented heuristic inference algorithms for GFODDs and implemented these heuristics in systems for decision theoretic planning. In this paper, we study the complexity of the computational problems addressed by such implementations. In particular, we study the evaluation problem, the satisfiability problem, and the equivalence problem for GFODDs under the assumption that the size of the intended model is given with the problem, a restriction that guarantees decidability. Our results provide a complete characterization placing these problems within the polynomial hierarchy. The same characterization applies to the corresponding restriction of problems in first order logic, giving an interesting new avenue for efficient inference when the number of objects is bounded. Our results show that for $\Sigma_k$ formulas, and for corresponding GFODDs, evaluation and satisfiability are $\Sigma_k^p$ complete, and equivalence is $\Pi_{k+1}^p$ complete. For $\Pi_k$ formulas evaluation is $\Pi_k^p$ complete, satisfiability is one level higher and is $\Sigma_{k+1}^p$ complete, and equivalence is $\Pi_{k+1}^p$ complete.
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Schwarz, Andreas, Huemmer, Christian, Maas, Roland, Kellermann, Walter
ABSTRACT We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction. Index Terms-- Speech Recognition, Reverberation, Diffuse Noise, Deep Neural Networks 1. INTRODUCTION In automatic speech recognizers (ASR) based on Gaussian Mixture Models and Hidden Markov Models (GMM-HMM), a wide variety of transformations and feature extraction steps is currently being employed with the aim of extracting and normalizing the information contained in the time-domain input signal as efficiently as possible. Recently, with the development of effective training methods for acoustic models based on multiple-layer neural networks, which are often summarized under the term "deep neural networks" (DNN) [1], it has become possible for the acoustic model to learn relationships between features and phonemes to a higher degree than it is possible with manually implemented feature transformation steps.
Spectral Sparsification of Random-Walk Matrix Polynomials
Cheng, Dehua, Cheng, Yu, Liu, Yan, Peng, Richard, Teng, Shang-Hua
We consider a fundamental algorithmic question in spectral graph theory: Compute a spectral sparsifier of random-walk matrix-polynomial $$L_\alpha(G)=D-\sum_{r=1}^d\alpha_rD(D^{-1}A)^r$$ where $A$ is the adjacency matrix of a weighted, undirected graph, $D$ is the diagonal matrix of weighted degrees, and $\alpha=(\alpha_1...\alpha_d)$ are nonnegative coefficients with $\sum_{r=1}^d\alpha_r=1$. Recall that $D^{-1}A$ is the transition matrix of random walks on the graph. The sparsification of $L_\alpha(G)$ appears to be algorithmically challenging as the matrix power $(D^{-1}A)^r$ is defined by all paths of length $r$, whose precise calculation would be prohibitively expensive. In this paper, we develop the first nearly linear time algorithm for this sparsification problem: For any $G$ with $n$ vertices and $m$ edges, $d$ coefficients $\alpha$, and $\epsilon > 0$, our algorithm runs in time $O(d^2m\log^2n/\epsilon^{2})$ to construct a Laplacian matrix $\tilde{L}=D-\tilde{A}$ with $O(n\log n/\epsilon^{2})$ non-zeros such that $\tilde{L}\approx_{\epsilon}L_\alpha(G)$. Matrix polynomials arise in mathematical analysis of matrix functions as well as numerical solutions of matrix equations. Our work is particularly motivated by the algorithmic problems for speeding up the classic Newton's method in applications such as computing the inverse square-root of the precision matrix of a Gaussian random field, as well as computing the $q$th-root transition (for $q\geq1$) in a time-reversible Markov model. The key algorithmic step for both applications is the construction of a spectral sparsifier of a constant degree random-walk matrix-polynomials introduced by Newton's method. Our algorithm can also be used to build efficient data structures for effective resistances for multi-step time-reversible Markov models, and we anticipate that it could be useful for other tasks in network analysis.
Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering
Lacoste-Julien, Simon, Lindsten, Fredrik, Bach, Francis
Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure). In this paper, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. By optimizing the position of the particles, we can obtain better accuracy than random or quasi-Monte Carlo sampling. In applications where the evaluation of the emission probabilities is expensive (such as in robot localization), the additional computational cost to generate the particles through optimization can be justified. Experiments on standard synthetic examples as well as on a robot localization task indicate indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.
Contextual Markov Decision Processes
Hallak, Assaf, Di Castro, Dotan, Mannor, Shie
We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and extensions of the framework are discussed, laying the ground for future research.
Learning Rank Functionals: An Empirical Study
Tran, Truyen, Phung, Dinh, Venkatesh, Svetha
Ranking is a key aspect of many applications, such as information retrieval, question answering, ad placement and recommender systems. Learning to rank has the goal of estimating a ranking model automatically from training data. In practical settings, the task often reduces to estimating a rank functional of an object with respect to a query. In this paper, we investigate key issues in designing an effective learning to rank algorithm. These include data representation, the choice of rank functionals, the design of the loss function so that it is correlated with the rank metrics used in evaluation. For the loss function, we study three techniques: approximating the rank metric by a smooth function, decomposition of the loss into a weighted sum of element-wise losses and into a weighted sum of pairwise losses. We then present derivations of piecewise losses using the theory of high-order Markov chains and Markov random fields. In experiments, we evaluate these design aspects on two tasks: answer ranking in a Social Question Answering site, and Web Information Retrieval.