Bayesian Learning
Maximum Entropy Discrimination
Jaakkola, Tommi, Meila, Marina, Jebara, Tony
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involvedistributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under thisclass and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.
Learning Factored Representations for Partially Observable Markov Decision Processes
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional independence assumptionsbetween random variables are compactly represented by network parameters. The parameters are learned online, and approximations areused to perform inference and to compute the optimal value function. The relative effects of inference and value function approximations onthe quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approximations, linearand quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed below thelevel of human performance on the task.
Manifold Stochastic Dynamics for Bayesian Learning
We propose a new Markov Chain Monte Carlo algorithm which is a generalization ofthe stochastic dynamics method. The algorithm performs exploration of the state space using its intrinsic geometric structure, facilitating efficientsampling of complex distributions. Applied to Bayesian learning in neural networks, our algorithm was found to perform at least as well as the best state-of-the-art method while consuming considerably less time. 1 Introduction
The Relevance Vector Machine
The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs,the requirement to estimate a tradeoff parameter and the need to utilise'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment ofa generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, theRVM requires dramatically fewer kernel functions.
On Input Selection with Reversible Jump Markov Chain Monte Carlo Sampling
In this paper we will treat input selection for a radial basis function (RBF) like classifier within a Bayesian framework. We approximate the a-posteriori distribution over both model coefficients and input subsets by samples drawn with Gibbs updates and reversible jump moves. Using some public datasets, we compare the classification accuracy of the method with a conventional ARD scheme. These datasets are also used to infer the a-posteriori probabilities of different inputsubsets.
Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers
We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes.The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility touse sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models. 1 Introduction Bayesian techniques have been widely and successfully used in the neural networks and statistics community and are appealing because of their conceptual simplicity, generality and consistency with which they solve learning problems. In this paper we present a new method for applying the Bayesian methodology to Support Vector machines. We will briefly review Gaussian Process and Support Vector classification in this section and clarify their relationship by pointing out the common roots. Although we focus on classification here, it is straightforward to apply the methods to regression problems as well. In section 2 we introduce our algorithm and show relations to existing methods. Finally, we present experimental results in section 3 and close with a discussion in section 4. Let X be a measure space (e.g.
Algorithms for Independent Components Analysis and Higher Order Statistics
Lee, Daniel D., Rokni, Uri, Sompolinsky, Haim
A latent variable generative model with finite noise is used to describe severaldifferent algorithms for Independent Components Analysis (lCA). In particular, the Fixed Point ICA algorithm is shown to be equivalent to the Expectation-Maximization algorithm for maximum likelihood under certain constraints, allowing the conditions for global convergence to be elucidated. The algorithms can also be explained by their generic behavior near a singular point where the size of the optimal generativebases vanishes. An expansion of the likelihood about this singular point indicates the role of higher order correlations in determining thefeatures discovered by ICA. The application and convergence of these algorithms are demonstrated on a simple illustrative example.
Learning to Parse Images
Hinton, Geoffrey E., Ghahramani, Zoubin, Teh, Yee Whye
We describe a class of probabilistic models that we call credibility networks. Using parse trees as internal representations of images, credibility networks are able to perform segmentation and recognition simultaneously,removing the need for ad hoc segmentation heuristics. Promising results in the problem of segmenting handwritten digitswere obtained.