Learning Graphical Models
Characterizing A Database of Sequential Behaviors with Latent Dirichlet Hidden Markov Models
Song, Yin, Cao, Longbing, Fan, Xuhui, Cao, Wei, Zhang, Jian
This paper proposes a generative model, the latent Dirichlet hidden Markov models (LDHMM), for characterizing a database of sequential behaviors (sequences). LDHMMs posit that each sequence is generated by an underlying Markov chain process, which are controlled by the corresponding parameters (i.e., the initial state vector, transition matrix and the emission matrix). These sequence-level latent parameters for each sequence are modelled as latent Dirichlet random variables and parameterized by a set of deterministic database-level hyper-parameters. Through this way, we expect to model the sequence in two levels: the database level by deterministic hyper-parameters and the sequence-level by latent parameters. To learn the deterministic hyper-parameters and approximate posteriors of parameters in LDHMMs, we propose an iterative algorithm under the variational EM framework, which consists of E and M steps. We examine two different schemes, the fullyfactorized and partially-factorized forms, for the framework, based on different assumptions. We present empirical results of behavior modeling and sequence classification on three real-world data sets, and compare them to other related models. The experimental results prove that the proposed LDHMMs produce better generalization performance in terms of log-likelihood and deliver competitive results on the sequence classification problem.
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Anandkumar, Animashree, Hsu, Daniel, Javanmard, Adel, Kakade, Sham M.
It is widely recognized that incorporating latent or hidden variables is a crucial aspect of modeling. Latent variables can provide a succinct representation of the observed data through dimensionality reduction; the possibly many observed variables are summarized by fewer hidden effects. Further, they are central to predicting causal relationships and interpreting the hidden effects as unobservable concepts. For instance in sociology, human behavior is affected by abstract notions such as social attitudes, beliefs, goals and plans. As another example, medical knowledge is organized into casual hierarchies of invading organisms, physical disorders, pathological states and symptoms, and only the symptoms are observed. In addition to incorporating latent variables, it is also important to model the complex dependencies among the variables. A popular class of models for incorporating such dependencies are the Bayesian networks, also known as belief networks. They incorporate a set of causal and conditional independence relationships through directed acyclic graphs (DAG) [49]. They have widespread applicability in artificial intelligence [19, 25, 41, 42], in the social sciences [13, 18, 40, 50, 51, 64], and as structural equation models in economics [12, 18, 33, 51, 60, 65].
Infinite Shift-invariant Grouped Multi-task Learning for Gaussian Processes
Wang, Yuyang, Khardon, Roni, Protopapas, Pavlos
Multi-task learning leverages shared information among data sets to improve the learning performance of individual tasks. The paper applies this framework for data where each task is a phase-shifted periodic time series. In particular, we develop a novel Bayesian nonparametric model capturing a mixture of Gaussian processes where each task is a sum of a group-specific function and a component capturing individual variation, in addition to each task being phase shifted. We develop an efficient \textsc{em} algorithm to learn the parameters of the model. As a special case we obtain the Gaussian mixture model and \textsc{em} algorithm for phased-shifted periodic time series. Furthermore, we extend the proposed model by using a Dirichlet Process prior and thereby leading to an infinite mixture model that is capable of doing automatic model selection. A Variational Bayesian approach is developed for inference in this model. Experiments in regression, classification and class discovery demonstrate the performance of the proposed models using both synthetic data and real-world time series data from astrophysics. Our methods are particularly useful when the time series are sparsely and non-synchronously sampled.
A Transfer Learning Approach for Learning Temporal Nodes Bayesian Networks
Cameras, Lindsey Jennifer Fiedler (Instituto Nacional de Astrofísica, Óptica y Electrónica) | Sucar, Luis Enrique (Instituto Nacional de Astrofísica, Óptica y Electrónica) | Morales, Eduardo F. (Instituto Nacional de Astrofísica, Óptica y Electrónica)
Situations where there is insufficient information to learn from often arise, and the process to recollect data can be expensive or in some cases take too long resulting in outdated models. Transfer learning strategies have proven to be a powerful technique to learn models from several sources when a single source does not provide enough information. In this work we present a methodology to learn a Temporal Nodes Bayesian Network by transferring knowledge from several different but related domains. Experiments based on a reference network show promising results, supporting our claim that transfer learning is a viable strategy to learn these models when scarce data is available.
An Empirical Comparison of Bayesian Network Parameter Learning Algorithms for Continuous Data Streams
Ratnapinda, Parot (University of Pittsburgh) | Druzdzel, Marek J. ( University of Pittsburgh Białystok University of Technology Białystok )
We compare three approaches to learning numerical parameters of Bayesian networks from continuous data streams: (1) the EM algorithm applied to all data, (2) the EM algorithm applied to data increments, and (3) the online EM algorithm. Our results show that learning from all data at each step, whenever feasible, leads to the highest parameter accuracy and model classification accuracy. When facing computational limitations, incremental learning approaches are a reasonable alternative. Of these, online EM is reasonably fast, and similar to the incremental EM algorithm in terms of accuracy. For small data sets, incremental EM seems to lead to better accuracy. When the data size gets large, online EM tends to be more accurate.
Assessing Motivational Strategies in Serious Games Using Hidden Markov Models
Derbali, Lotfi (University of Montreal) | Ghali, Ramla (University of Montreal) | Frasson, Claude (University of Montreal)
Recent research has extended tutor strategies to model not just interventions to offer information and activities, but also interventions to support learners’ wills and motivation. It is important to investigate new ways, intertwined with learners’ performance (successful completion of tasks) and judgement (self-report questionnaires), for evaluating tutor intervention strategies. One promising way is the use of physiological sensors. Within this paper, we study some motivational strategies that were implemented in a serious game called HeapMotiv to support learners’ performance and motivation. We build several hidden Markov models which use Keller’s ARCS model of motivation and electrophysiological data (heart rate HR, skin conductance SC and EEG) and are able to identify physiological patterns correlated with different motivational strategies.
Exploiting Key Events for Learning Interception Policies
Chang, Yuan (University of Central Florida) | Sukthankar, Gita Reese (University of Central Florida)
One scenario that commonly arises in computer games and military training simulations is predator-prey pursuit in which the goal of the non-player character agent is to successfully intercept a fleeing player. In this paper, we focus on a variant of the problem in which the agent does not have perfect information about the player’s location but has prior experience in combating the player. Effectively addressing this problem requires a combination of learning the opponent’s tactics while planning an interception strategy. Although for small maps, solving the problem with standard POMDP (Partially Observable Markov Decision Process) solvers is feasible, increasing the search area renders many standard techniques intractable due to the increase in the belief state size and required plan length. Here we introduce a new approach for solving the problem on large maps that exploits key events, high reward regions in the belief state discovered at the higher level of abstraction, to plan efficiently over the low-level map. We demonstrate that our hierarchical key-events planner can learn intercept policies from traces of previous pursuits significantly faster than a standard point-based POMDP solver, particularly as the maps scale in size.
Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families
Bartlett, Peter, Grunwald, Peter, Harremoes, Peter, Hedayati, Fares, Kotlowski, Wojciech
We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1
Factored expectation propagation for input-output FHMM models in systems biology
Cseke, Botond, Sanguinetti, Guido
The advent of high throughput technologies in biology has opened novel opportunities to investigate biological processes from a comprehensive point of view. At the same time, the noisy and high dimensional nature of these data sets gives rise to formidable statistical challenges, and has led to systems biology becoming a fertile area for machine learning applications, as well as a motivation for novel modelling methodologies. In this paper, we are interested in jointly modelling mRNA measurements (transcrip-tomics) together with metabolite measurements in order to provide a platform for understanding the chemical regulation of gene expression. From the statistical perspective, this is naturally addressed using a latent variables framework: mRNA transcription is known to be controlled by the activation state of a class of proteins, transcription factors (TFs), which mediate metabolic signals through fast conformational changes (Alon, 2006). However, due to their fast dynamic and often low concentrations, TFs are particularly difficult to assay experimentally, leading to the need for statistical inference methodologies (Asif & Sanguinetti, 2011; Shi et al., 2008). Here, we adopt a model of transcriptional regulation which is based on a binary representation of transcription factor states, a Factorial Hidden Markov Model (FHMM).
A Feature Subset Selection Algorithm Automatic Recommendation Method
Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.
Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.