Country
Expectation Consistent Free Energies for Approximate Inference
We propose a novel a framework for deriving approximations for intractable probabilisticmodels. This framework is based on a free energy (negative log marginal likelihood) and can be seen as a generalization of adaptive TAP [1, 2, 3] and expectation propagation (EP) [4, 5]. The free energy is constructed from two approximating distributions which encode different aspects of the intractable model such a single node constraints andcouplings and are by construction consistent on a chosen set of moments. We test the framework on a difficult benchmark problem with binary variables on fully connected graphs and 2D grid graphs. We find good performance using sets of moments which either specify factorized nodesor a spanning tree on the nodes (structured approximation). Surprisingly, the Bethe approximation gives very inferior results even on grids.
Beat Tracking the Graphical Model Way
Lang, Dustin, Freitas, Nando D.
Dixon describes beats as follows: "much music has as its rhythmic basis a series of pulses, spaced approximately equally in time, relative to which the timing of all musical events can be described. This phenomenon is called the beat, and the individual pulses are also called beats"[1]. Given a piece of recorded music (an MP3 file, for example), we wish to produce a set of beats that correspond to the beats perceived by human listeners. The set of beats of a song can be characterised by the trajectories through time of thetempo and phase offset. Tempo is typically measured in beats per minute (BPM), and describes the frequency of beats.
On-Chip Compensation of Device-Mismatch Effects in Analog VLSI Neural Networks
Figueroa, Miguel, Bridges, Seth, Diorio, Chris
Device mismatch in VLSI degrades the accuracy of analog arithmetic circuits and lowers the learning performance of large-scale neural networks implementedin this technology. We show compact, low-power on-chip calibration techniques that compensate for device mismatch. Our techniques enable large-scale analog VLSI neural networks with learning performanceon the order of 10 bits. We demonstrate our techniques on a 64-synapse linear perceptron learning with the Least-Mean-Squares (LMS) algorithm, and fabricated in a 0.35µm CMOS process.
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
Zhang, Jian, Ghahramani, Zoubin, Yang, Yiming
In this paper we propose a probabilistic model for online document clustering. Weuse nonparametric Dirichlet process prior to model the growing number of clusters, and use a prior of general English language model as the base distribution to handle the generation of novel clusters. Furthermore, cluster uncertainty is modeled with a Bayesian Dirichletmultinomial distribution.We use empirical Bayes method to estimate hyperparameters based on a historical dataset. Our probabilistic model is applied to the novelty detection task in Topic Detection and Tracking (TDT) and compared with existing approaches in the literature.
Similarity and Discrimination in Classical Conditioning: A Latent Variable Account
Courville, Aaron C., Daw, Nathaniel D., Touretzky, David S.
We propose a probabilistic, generative account of configural learning phenomena in classical conditioning. Configural learning experiments probe how animals discriminate and generalize between patterns of simultaneously presentedstimuli (such as tones and lights) that are differentially predictive of reinforcement. Previous models of these issues have been successful more on a phenomenological than an explanatory level: they reproduce experimental findings but, lacking formal foundations, providescant basis for understanding why animals behave as they do. We present a theory that clarifies seemingly arbitrary aspects of previous modelswhile also capturing a broader set of data.
Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning
Log-concavity is an important property in the context of optimization, Laplace approximation, and sampling; Bayesian methods based on Gaussian processpriors have become quite popular recently for classification, regression, density estimation, and point process intensity estimation. Here we prove that the predictive densities corresponding to each of these applications are log-concave, given any observed data. We also prove that the likelihood is log-concave in the hyperparameters controlling the mean function of the Gaussian prior in the density and point process intensity estimationcases, and the mean, covariance, and observation noise parameters in the classification and regression cases; this result leads to a useful parameterization of these hyperparameters, indicating a suitably large class of priors for which the corresponding maximum a posteriori problem is log-concave. Introduction Bayesian methods based on Gaussian process priors have recently become quite popular for machine learning tasks (1). These techniques have enjoyed a good deal of theoretical examination, documenting their learning-theoretic (generalization) properties (2), and developing avariety of efficient computational schemes (e.g., (3-5), and references therein).
Fast Rates to Bayes for Kernel Machines
Steinwart, Ingo, Scovel, Clint
We establish learning rates to the Bayes risk for support vector machines (SVMs) with hinge loss. In particular, for SVMs with Gaussian RBF kernels we propose a geometric condition for distributions which can be used to determine approximation properties of these kernels. Finally, we compare our methods with a recent paper of G. Blanchard et al..