Country
Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
Frey, Brendan J., Patrascu, Relu, Jaakkola, Tommi, Moran, Jodi
Exact inference in large, richly connected noisy-OR networks is intractable, and most approximate inference algorithms tend to concentrate on a small number of most probable configurations of the hidden variables under the posterior. We presented an "inclusive" variational method for bipartite noisy-OR networks that favors including all probable configurations, at the cost of including some improbable configurations. The method fits a tree to the posterior distribution sequentially, i.e., one observation at a time. Results on an ensemble of QMR-DT type networks show that the method performs better than local probability propagation and a variational upper bound for ranking most probable diseases.
Accumulator Networks: Suitors of Local Probability Propagation
Frey, Brendan J., Kannan, Anitha
One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce "accumulator networks" that have low local complexity (but exponential global complexity) so the sum-product algorithm can be directly applied. In an accumulator network, the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. After giving expressions for inference and learning in accumulator networks, we give results on the "bars problem" and on the problem of extracting translated, overlapping faces from an image. 1 Introduction Graphical probability models with hidden variables are capable of representing complex dependencies between variables, filling in missing data and making Bayesoptimal decisions using probabilistic inferences (Hinton and Sejnowski 1986; Pearl 1988; Neal 1992).
Discovering Hidden Variables: A Structure-Based Approach
Elidan, Gal, Lotner, Noam, Friedman, Nir, Koller, Daphne
A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables.
Incorporating Second-Order Functional Knowledge for Better Option Pricing
Dugas, Charles, Bengio, Yoshua, Bélisle, François, Nadeau, Claude, Garcia, René
Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in two of its arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of continuous functions with these and other properties. We apply this new class of functions to the task of modeling the price of call options. Experiments show improvements on regressing the price of call options using the new types of function classes that incorporate the a priori constraints.
An Adaptive Metric Machine for Pattern Classification
Domeniconi, Carlotta, Peng, Jing, Gunopulos, Dimitrios
Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1 Introduction
Explaining Away in Weight Space
Explaining away has mostly been considered in terms of inference of states in belief networks. We show how it can also arise in a Bayesian context in inference about the weights governing relationships such as those between stimuli and reinforcers in conditioning experiments such as bacA, 'Ward blocking. We show how explaining away in weight space can be accounted for using an extension of a Kalman filter model; provide a new approximate way of looking at the Kalman gain matrix as a whitener for the correlation matrix of the observation process; suggest a network implementation of this whitener using an architecture due to Goodall; and show that the resulting model exhibits backward blocking.
Sparse Representation for Gaussian Process Models
We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
Cohn, David A., Hofmann, Thomas
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as authoritative documents within those topics. Furthermore, the relationships between topics is mapped out in order to build a predictive model of link content. Among the many applications of this approach are information retrieval and search, topic identification, query disambiguation, focused web crawling, web authoring, and bibliometric analysis.
Gaussianization
Chen, Scott Saobing, Gopinath, Ramesh A.
High dimensional data modeling is difficult mainly because the so-called "curse of dimensionality". We propose a technique called "Gaussianization" for high dimensional density estimation, which alleviates the curse of dimensionality by exploiting the independence structures in the data. Gaussianization is motivated from recent developments in the statistics literature: projection pursuit, independent component analysis and Gaussian mixture models with semi-tied covariances. We propose an iterative Gaussianization procedure which converges weakly: at each iteration, the data is first transformed to the least dependent coordinates and then each coordinate is marginally Gaussianized by univariate techniques. Gaussianization offers density estimation sharper than traditional kernel methods and radial basis function methods.
Vicinal Risk Minimization
Chapelle, Olivier, Weston, Jason, Bottou, Léon, Vapnik, Vladimir
The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principle such as Support Vector Machines or Statistical Regularization. We explain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms for solving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.