Directed Networks
Sparse estimation via nonconcave penalized likelihood in a factor analysis model
We consider the problem of sparse estimation in a factor analysis model. A traditional estimation procedure in use is the following two-step approach: the model is estimated by maximum likelihood method and then a rotation technique is utilized to find sparse factor loadings. However, the maximum likelihood estimates cannot be obtained when the number of variables is much larger than the number of observations. Furthermore, even if the maximum likelihood estimates are available, the rotation technique does not often produce a sufficiently sparse solution. In order to handle these problems, this paper introduces a penalized likelihood procedure that imposes a nonconvex penalty on the factor loadings. We show that the penalized likelihood procedure can be viewed as a generalization of the traditional two-step approach, and the proposed methodology can produce sparser solutions than the rotation technique. A new algorithm via the EM algorithm along with coordinate descent is introduced to compute the entire solution path, which permits the application to a wide variety of convex and nonconvex penalties. Monte Carlo simulations are conducted to investigate the performance of our modeling strategy. A real data example is also given to illustrate our procedure.
Variational Semi-blind Sparse Deconvolution with Orthogonal Kernel Bases and its Application to MRFM
Park, Se Un, Dobigeon, Nicolas, Hero, Alfred O.
We present a variational Bayesian method of joint image reconstruction and point spread function (PSF) estimation when the PSF of the imaging device is only partially known. To solve this semi-blind deconvolution problem, prior distributions are specified for the PSF and the 3D image. Joint image reconstruction and PSF estimation is then performed within a Bayesian framework, using a variational algorithm to estimate the posterior distribution. The image prior distribution imposes an explicit atomic measure that corresponds to image sparsity. Importantly, the proposed Bayesian deconvolution algorithm does not require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chain Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly outperforms mismatched non-blind algorithms that rely on the assumption of the perfect knowledge of the PSF. The algorithm is illustrated on real data from magnetic resonance force microscopy (MRFM).
Herded Gibbs Sampling
Bornn, Luke, Chen, Yutian, de Freitas, Nando, Eskelin, Mareija, Fang, Jing, Welling, Max
The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem.
The Bounded Bayesian
The ideal Bayesian agent reasons from a global probability model, but real agents are restricted to simplified models which they know to be adequate only in restricted circumstances. Very little formal theory has been developed to help fallibly rational agents manage the process of constructing and revising small world models. The goal of this paper is to present a theoretical framework for analyzing model management approaches. For a probability forecasting problem, a search process over small world models is analyzed as an approximation to a larger-world model which the agent cannot explicitly enumerate or compute. Conditions are given under which the sequence of small-world models converges to the larger-world probabilities.
EigenGP: Sparse Gaussian process models with data-dependent eigenfunctions
Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost and it is difficult to design nonstationary GP priors in practice. In this paper, we propose a sparse Gaussian process model, EigenGP, based on the Karhunen-Loรจve (KL) expansion of a GP prior. We use the Nystrรถm approximation to obtain data dependent eigenfunctions and select these eigenfunctions by evidence maximization. This selection reduces the number of eigenfunctions in our model and provides a nonstationary covariance function. To handle nonlinear likelihoods, we develop an efficient expectation propagation (EP) inference algorithm, and couple it with expectation maximization for eigenfunction selection. Because the eigenfunctions of a Gaussian kernel are associated with clusters of samples - including both the labeled and unlabeled - selecting relevant eigenfunctions enables EigenGP to conduct semi-supervised learning. Our experimental results demonstrate improved predictive performance of EigenGP over alternative state-of-the-art sparse GP and semisupervised learning methods for regression, classification, and semisupervised classification.
Sidestepping the Triangulation Problem in Bayesian Net Computations
Zhang, Nevin Lianwen, Poole, David L.
This paper presents a new approach for computing posterior probabilities in Bayesian nets, which sidesteps the triangulation problem. The current state of art is the clique tree propagation approach. When the underlying graph of a Bayesian net is triangulated, this approach arranges its cliques into a tree and computes posterior probabilities by appropriately passing around messages in that tree. The computation in each clique is simply direct marginalization. When the underlying graph is not triangulated, one has to first triangulated it by adding edges. Referred to as the triangulation problem, the problem of finding an optimal or even a ?good? triangulation proves to be difficult. In this paper, we propose to first decompose a Bayesian net into smaller components by making use of Tarjan's algorithm for decomposing an undirected graph at all its minimal complete separators. Then, the components are arranged into a tree and posterior probabilities are computed by appropriately passing around messages in that tree. The computation in each component is carried out by repeating the whole procedure from the beginning. Thus the triangulation problem is sidestepped.
A Decision Calculus for Belief Functions in Valuation-Based Systems
Valuation-based system (VBS) provides a general framework for representing knowledge and drawing inferences under uncertainty. Recent studies have shown that the semantics of VBS can represent and solve Bayesian decision problems (Shenoy, 1991a). The purpose of this paper is to propose a decision calculus for Dempster-Shafer (D-S) theory in the framework of VBS. The proposed calculus uses a weighting factor whose role is similar to the probabilistic interpretation of an assumption that disambiguates decision problems represented with belief functions (Strat 1990). It will be shown that with the presented calculus, if the decision problems are represented in the valuation network properly, we can solve the problems by using fusion algorithm (Shenoy 1991a). It will also be shown the presented decision calculus can be reduced to the calculus for Bayesian probability theory when probabilities, instead of belief functions, are given.
Exploring Localization in Bayesian Networks for Large Expert Systems
Xiang, Yang, Poole, David L., Beddoes, Michael P.
Current Bayesian net representations do not consider structure in the domain and include all variables in a homogeneous network. At any time, a human reasoner in a large domain may direct his attention to only one of a number of natural subdomains, i.e., there is ?localization' of queries and evidence. In such a case, propagating evidence through a homogeneous network is inefficient since the entire network has to be updated each time. This paper presents multiply sectioned Bayesian networks that enable a (localization preserving) representation of natural subdomains by separate Bayesian subnets. The subnets are transformed into a set of permanent junction trees such that evidential reasoning takes place at only one of them at a time. Probabilities obtained are identical to those that would be obtained from the homogeneous network. We discuss attention shift to a different junction tree and propagation of previously acquired evidence. Although the overall system can be large, computational requirements are governed by the size of only one junction tree.
Generalizing Jeffrey Conditionalization
Jeffrey's rule has been generalized by Wagner to the case in which new evidence bounds the possible revisions of a prior probability below by a Dempsterian lower probability. Classical probability kinematics arises within this generalization as the special case in which the evidentiary focal elements of the bounding lower probability are pairwise disjoint. We discuss a twofold extension of this generalization, first allowing the lower bound to be any two-monotone capacity and then allowing the prior to be a lower envelope.
An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation
In a previous paper [Pearl and Verma, 1991] we presented an algorithm for extracting causal influences from independence information, where a causal influence was defined as the existence of a directed arc in all minimal causal models consistent with the data. In this paper we address the question of deciding whether there exists a causal model that explains ALL the observed dependencies and independencies. Formally, given a list M of conditional independence statements, it is required to decide whether there exists a directed acyclic graph (dag) D that is perfectly consistent with M, namely, every statement in M, and no other, is reflected via dseparation in D. We present and analyze an effective algorithm that tests for the existence of such a day, and produces one, if it exists.