Learning Graphical Models
Measuring Shared Information and Coordinated Activity in Neuronal Networks
Klinkner, Kristina, Shalizi, Cosma, Camperi, Marcelo
This activity often manifests itself as dynamically coordinated sequences of action potentials. Since multiple electrode recordings are now a standard tool in neuroscience research, it is important to have a measure of such network-wide behavioral coordinationand information sharing, applicable to multiple neural spike train data. We propose a new statistic, informational coherence, which measures how much better one unit can be predicted by knowing the dynamical state of another. We argue informational coherence is a measure of association and shared information which is superior to traditional pairwisemeasures of synchronization and correlation. To find the dynamical states, we use a recently-introduced algorithm which reconstructs effectivestate spaces from stochastic time series.
Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
Kapoor, Ashish, Ahn, Hyungil, Qi, Yuan, Picard, Rosalind W.
There have been many graph-based approaches for semi-supervised classification. Oneproblem is that of hyperparameter learning: performance depends greatly on the hyperparameters of the similarity graph, transformation ofthe graph Laplacian and the noise model. We present a Bayesian framework for learning hyperparameters for graph-based semisupervised classification.Given some labeled data, which can contain inaccurate labels, we pose the semi-supervised classification as an inference problemover the unknown labels. Expectation Propagation is used for approximate inference and the mean of the posterior is used for classification. The hyperparameters are learned using EM for evidence maximization. We also show that the posterior mean can be written in terms of the kernel matrix, providing a Bayesian classifier to classify new points. Tests on synthetic and real datasets show cases where there are significant improvements in performance over the existing approaches.
Worst-Case Bounds for Gaussian Process Models
Kakade, Sham M., Seeger, Matthias W., Foster, Dean P.
Dean P. Foster University of Pennsylvania We present a competitive analysis of some nonparametric Bayesian algorithms ina worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) andprovide bounds on the regret (under the log loss) for commonly usednon-parametric Bayesian algorithms -- including Gaussian regression and logistic regression -- which show how these algorithms can perform favorably under rather general conditions.
Efficient Estimation of OOMs
Jaeger, Herbert, Zhao, Mingjie, Kolling, Andreas
A standard method to obtain stochastic models for symbolic time series is to train state-emitting hidden Markov models (SE-HMMs) with the Baum-Welch algorithm. Based on observable operator models (OOMs), in the last few months a number of novel learning algorithms for similar purposeshave been developed: (1,2) two versions of an "efficiency sharpening" (ES) algorithm, which iteratively improves the statistical efficiency ofa sequence of OOM estimators, (3) a constrained gradient descent ML estimator for transition-emitting HMMs (TE-HMMs). We give an overview on these algorithms and compare them with SE-HMM/EM learning on synthetic and real-life data.
Bayesian Surprise Attracts Human Attention
Itti, Laurent, Baldi, Pierre F.
The concept of surprise is central to sensory processing, adaptation, learning, and attention. Yet, no widely-accepted mathematical theory currently exists to quantitatively characterize surprise elicited by a stimulus orevent, for observers that range from single neurons to complex natural or engineered systems. We describe a formal Bayesian definition ofsurprise that is the only consistent formulation under minimal axiomatic assumptions.Surprise quantifies how data affects a natural or artificial observer, by measuring the difference between posterior and prior beliefs of the observer. Using this framework we measure the extent to which humans direct their gaze towards surprising items while watching television and video games. We find that subjects are strongly attracted towards surprising locations, with 72% of all human gaze shifts directed towards locations more surprising than the average, a figure which rises to 84% when considering only gaze targets simultaneously selected by all subjects. The resulting theory of surprise is applicable across different spatio-temporalscales, modalities, and levels of abstraction.
Bayesian Sets
Ghahramani, Zoubin, Heller, Katherine A.
Sets", we consider the problem of retrieving items from a concept or cluster, given a query consisting of a few items from that cluster. We formulate this as a Bayesian inference problem and describe avery simple algorithm for solving it. Our algorithm uses a modelbased concept of a cluster and ranks items using a score which evaluates the marginal probability that each item belongs to a cluster containing the query items. For exponential family models with conjugate priors this marginal probability is a simple function of sufficient statistics. We focus on sparse binary data and show that our score can be evaluated exactly usinga single sparse matrix multiplication, making it possible to apply our algorithm to very large datasets. We evaluate our algorithm on three datasets: retrieving movies from EachMovie, finding completions of author sets from the NIPS dataset, and finding completions of sets of words appearing in the Grolier encyclopedia.
Pattern Recognition from One Example by Chopping
Fleuret, Francois, Blanchard, Gilles
We investigate the learning of the appearance of an object from a single image of it. Instead of using a large number of pictures of the object to recognize, we use a labeled reference database of pictures of other objects tolearn invariance to noise and variations in pose and illumination. This acquired knowledge is then used to predict if two pictures of new objects, which do not appear on the training pictures, actually display the same object. We propose a generic scheme called chopping to address this task. It relies on hundreds of random binary splits of the training set chosen to keep together the images of any given object. Those splits are extended to the complete image space with a simple learning algorithm. Given two images, the responses of the split predictors are combined with a Bayesian rule into a posterior probability of similarity.