Learning Graphical Models
Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Choudhury, Tanzeem, Basu, Sumit
There is a long history of work in the social sciences aimed at understanding the interactions between individuals and the influences they have on each others' behavior. However, existing studies of social network interactions have either been restricted to online communities, where unambiguous measurements about how people interact can be obtained, or have been forced to rely on questionnaires or diaries to get data on face-to-face interactions. Survey-based methods are error prone and impractical to scale up. Studies show that self-reports correspond poorly to communication behavior as recorded by independent observers [3]. In contrast, we have used wearable sensors and recent advances in speech processing techniques to automatically gather information about conversations: when they occurred, who was involved, and who was speaking when.
A Machine Learning Approach to Conjoint Analysis
Chapelle, Olivier, Harchaoui, Zaïd
Choice-based conjoint analysis builds models of consumer preferences over products with answers gathered in questionnaires. Our main goal is to bring tools from the machine learning community to solve this problem more efficiently. Thus, we propose two algorithms to quickly and accurately estimate consumer preferences.
Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation
Chakrabartty, Shantanu, Cauwenberghs, Gert
An analog system-on-chip for kernel-based pattern classification and sequence estimation is presented. State transition probabilities conditioned on input data are generated by an integrated support vector machine. Dot product based kernels and support vector coefficients are implemented in analog programmable floating gate translinear circuits, and probabilities are propagated and normalized using sub-threshold current-mode circuits. A 14-input, 24-state, and 720-support vector forward decoding kernel machine is integrated on a 3mm 3mm chip in 0.5µm CMOS technology. Experiments with the processor trained for speaker verification and phoneme sequence estimation demonstrate real-time recognition accuracy at par with floating-point software, at sub-microwatt power.
Markov Networks for Detecting Overalpping Elements in Sequence Data
Craven, Mark, Bockhorst, Joseph
Many sequential prediction tasks involve locating instances of patterns in sequences. Generative probabilistic language models, such as hidden Markov models (HMMs), have been successfully applied to many of these tasks. A limitation of these models however, is that they cannot naturally handle cases in which pattern instances overlap in arbitrary ways. We present an alternative approach, based on conditional Markov networks, that can naturally represent arbitrarily overlapping elements. We show how to efficiently train and perform inference with these models. Experimental results from a genomics domain show that our models are more accurate at locating instances of overlapping patterns than are baseline models based on HMMs.
Exponentiated Gradient Algorithms for Large-margin Structured Classification
Bartlett, Peter L., Collins, Michael, Taskar, Ben, McAllester, David A.
We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient--even in cases where the number of labels y is exponential in size--provided that certain expectations under Gibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application of exponentiated gradient updates [7, 8] to quadratic programs.
Spike Sorting: Bayesian Clustering of Non-Stationary Data
Bar-hillel, Aharon, Spiro, Adam, Stark, Eran
Spike sorting involves clustering spike trains recorded by a microelectrode according to the source neuron. It is a complicated problem, which requires a lot of human labor, partly due to the non-stationary nature of the data. We propose an automated technique for the clustering of non-stationary Gaussian sources in a Bayesian framework. At a first search stage, data is divided into short time frames and candidate descriptions of the data as a mixture of Gaussians are computed for each frame. At a second stage transition probabilities between candidate mixtures are computed, and a globally optimal clustering is found as the MAP solution of the resulting probabilistic model. Transition probabilities are computed using local stationarity assumptions and are based on a Gaussian version of the Jensen-Shannon divergence. The method was applied to several recordings. The performance appeared almost indistinguishable from humans in a wide range of scenarios, including movement, merges, and splits of clusters.
Harmonising Chorales by Probabilistic Inference
Allan, Moray, Williams, Christopher
We describe how we used a data set of chorale harmonisations composed by Johann Sebastian Bach to train Hidden Markov Models. Using a probabilistic framework allows us to create a harmonisation system which learns from examples, and which can compose new harmonisations. We make a quantitative comparison of our system's harmonisation performance against simpler models, and provide example harmonisations.
Learning first-order Markov models for control
First-order Markov models have been successfully applied to many problems, for example in modeling sequential data using Markov chains, and modeling control problems using the Markov decision processes (MDP) formalism. If a first-order Markov model's parameters are estimated from data, the standard maximum likelihood estimator considers only the first-order (single-step) transitions. But for many problems, the firstorder conditional independence assumptions are not satisfied, and as a result the higher order transition probabilities may be poorly approximated. Motivated by the problem of learning an MDP's parameters for control, we propose an algorithm for learning a first-order Markov model that explicitly takes into account higher order interactions during training. Our algorithm uses an optimization criterion different from maximum likelihood, and allows us to learn models that capture longer range effects, but without giving up the benefits of using first-order Markov models. Our experimental results also show the new algorithm outperforming conventional maximum likelihood estimation in a number of control problems where the MDP's parameters are estimated from data.
Experts in a Markov Decision Process
Even-dar, Eyal, Kakade, Sham M., Mansour, Yishay
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard.
Probabilistic Computation in Spiking Populations
Zemel, Richard S., Natarajan, Rama, Dayan, Peter, Huys, Quentin J.
As animals interact with their environments, they must constantly update estimates about their states. Bayesian models combine prior probabilities, adynamical model and sensory evidence to update estimates optimally. Thesemodels are consistent with the results of many diverse psychophysical studies. However, little is known about the neural representation andmanipulation of such Bayesian information, particularly in populations of spiking neurons. We consider this issue, suggesting a model based on standard neural architecture and activations. We illustrate theapproach on a simple random walk example, and apply it to a sensorimotor integration task that provides a particularly compelling example of dynamic probabilistic computation. Bayesian models have been used to explain a gamut of experimental results in tasks which require estimates to be derived from multiple sensory cues.