Uncertainty
Supervised Topic Models
Mcauliffe, Jon D., Blei, David M.
We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.
On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models
Fienberg, Stephen E., Rinaldo, Alessandro, Zhou, Yi
There has been an explosion of interest in statistical models for analyzing network data, and considerable interest in the class of exponential random graph (ERG) models, especially in connection with difficulties in computing maximum likelihood estimates. The issues associated with these difficulties relate to the broader structure of discrete exponential families. This paper re-examines the issues in two parts. First we consider the closure of $k$-dimensional exponential families of distribution with discrete base measure and polyhedral convex support $\mathrm{P}$. We show that the normal fan of $\mathrm{P}$ is a geometric object that plays a fundamental role in deriving the statistical and geometric properties of the corresponding extended exponential families. We discuss its relevance to maximum likelihood estimation, both from a theoretical and computational standpoint. Second, we apply our results to the analysis of ERG models. In particular, by means of a detailed example, we provide some characterization of the properties of ERG models, and, in particular, of certain behaviors of ERG models known as degeneracy.
AND/OR Multi-Valued Decision Diagrams (AOMDDs) for Graphical Models
Mateescu, R., Dechter, R., Marinescu, R.
Inspired by the recently introduced framework of AND/OR search spaces for graphical models, we propose to augment Multi-Valued Decision Diagrams (MDD) with AND nodes, in order to capture function decomposition structure and to extend these compiled data structures to general weighted graphical models (e.g., probabilistic models). We present the AND/OR Multi-Valued Decision Diagram (AOMDD) which compiles a graphical model into a canonical form that supports polynomial (e.g., solution counting, belief updating) or constant time (e.g. equivalence of graphical models) queries. We provide two algorithms for compiling the AOMDD of a graphical model. The first is search-based, and works by applying reduction rules to the trace of the memory intensive AND/OR search algorithm. The second is inference-based and uses a Bucket Elimination schedule to combine the AOMDDs of the input functions via the the APPLY operator. For both algorithms, the compilation time and the size of the AOMDD are, in the worst case, exponential in the treewidth of the graphical model, rather than pathwidth as is known for ordered binary decision diagrams (OBDDs). We introduce the concept of semantic treewidth, which helps explain why the size of a decision diagram is often much smaller than the worst case bound. We provide an experimental evaluation that demonstrates the potential of AOMDDs.
Efficient Exact Inference in Planar Ising Models
Schraudolph, Nicol N., Kamenetsky, Dmitry
We give polynomial-time algorithms for the exact computation of lowest-energy (ground) states, worst margin violators, log partition functions, and marginal edge probabilities in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings (dimer coverings) in an expanded dual graph. We implement a unified framework while delegating complex but well-understood subproblems (planar embedding, maximum-weight perfect matching) to established algorithms for which efficient implementations are freely available. Unlike graph cut methods, we can perform penalized maximum-likelihood as well as maximum-margin parameter estimation in the associated conditional random fields (CRFs), and employ marginal posterior probabilities as well as maximum a posteri-ori (MAP) states for prediction. Maximum-margin CRF parameter estimation on image denoising and segmentation problems shows our approach to be efficient and effective. A C implementation is available from http://nic.schraudolph.org/isinf/ .
Probabilistic reasoning with answer sets
Baral, Chitta, Gelfond, Michael, Rushton, Nelson
To appear in Theory and Practice of Logic Programming (TPLP) This paper develops a declarative language, P-log, that combines logical and probabilistic arguments in its reasoning. Answer Set Prolog is used as the logical foundation, while causal Bayes nets serve as a probabilistic foundation. We give several nontrivial examples and illustrate the use of P-log for knowledge representation and updating of knowledge. We argue that our approach to updates is more appealing than existing approaches. We give sufficiency conditions for the coherency of P-log programs and show that Bayes nets can be easily mapped to coherent P-log programs.
High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence
Ravikumar, Pradeep, Wainwright, Martin J., Raskutti, Garvesh, Yu, Bin
Given i.i.d. observations of a random vector $X \in \mathbb{R}^p$, we study the problem of estimating both its covariance matrix $\Sigma^*$, and its inverse covariance or concentration matrix {$\Theta^* = (\Sigma^*)^{-1}$.} We estimate $\Theta^*$ by minimizing an $\ell_1$-penalized log-determinant Bregman divergence; in the multivariate Gaussian case, this approach corresponds to $\ell_1$-penalized maximum likelihood, and the structure of $\Theta^*$ is specified by the graph of an associated Gaussian Markov random field. We analyze the performance of this estimator under high-dimensional scaling, in which the number of nodes in the graph $p$, the number of edges $s$ and the maximum node degree $d$, are allowed to grow as a function of the sample size $n$. In addition to the parameters $(p,s,d)$, our analysis identifies other key quantities covariance matrix $\Sigma^*$; and (b) the $\ell_\infty$ operator norm of the sub-matrix $\Gamma^*_{S S}$, where $S$ indexes the graph edges, and $\Gamma^* = (\Theta^*)^{-1} \otimes (\Theta^*)^{-1}$; and (c) a mutual incoherence or irrepresentability measure on the matrix $\Gamma^*$ and (d) the rate of decay $1/f(n,\delta)$ on the probabilities $ \{|\hat{\Sigma}^n_{ij}- \Sigma^*_{ij}| > \delta \}$, where $\hat{\Sigma}^n$ is the sample covariance based on $n$ samples. Our first result establishes consistency of our estimate $\hat{\Theta}$ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $d = o(\sqrt{s})$. In our second result, we show that with probability converging to one, the estimate $\hat{\Theta}$ correctly specifies the zero pattern of the concentration matrix $\Theta^*$.
Learning Partially Observable Deterministic Action Models
We present exact algorithms for identifying deterministic-actions' effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model(the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenarios are common in real world applications. They are challenging for AI tasks because traditional domain structures that underly tractability (e.g., conditional independence) fail there (e.g., world features become correlated). Our work departs from traditional assumptions about partial observations and action models. In particular, it focuses on problems in which actions are deterministic of simple logical structure and observation models have all features observed with some frequency. We yield tractable algorithms for the modified problem for such domains. Our algorithms take sequences of partial observations over time as input, and output deterministic action models that could have lead to those observations. The algorithms output all or one of those models (depending on our choice), and are exact in that no model is misclassified given the observations. Our algorithms take polynomial time in the number of time steps and state features for some traditional action classes examined in the AI-planning literature, e.g., STRIPS actions. In contrast, traditional approaches for HMMs and Reinforcement Learning are inexact and exponentially intractable for such domains. Our experiments verify the theoretical tractability guarantees, and show that we identify action models exactly. Several applications in planning, autonomous exploration, and adventure-game playing already use these results. They are also promising for probabilistic settings, partially observable reinforcement learning, and diagnosis.
Inference with Discriminative Posterior
Salojärvi, Jarkko, Puolamäki, Kai, Savia, Eerika, Kaski, Samuel
We study Bayesian discriminative inference given a model family $p(c,\x, \theta)$ that is assumed to contain all our prior information but still known to be incorrect. This falls in between "standard" Bayesian generative modeling and Bayesian regression, where the margin $p(\x,\theta)$ is known to be uninformative about $p(c|\x,\theta)$. We give an axiomatic proof that discriminative posterior is consistent for conditional inference; using the discriminative posterior is standard practice in classical Bayesian regression, but we show that it is theoretically justified for model families of joint densities as well. A practical benefit compared to Bayesian regression is that the standard methods of handling missing values in generative modeling can be extended into discriminative inference, which is useful if the amount of data is small. Compared to standard generative modeling, discriminative posterior results in better conditional inference if the model family is incorrect. If the model family contains also the true model, the discriminative posterior gives the same result as standard Bayesian generative modeling. Practical computation is done with Markov chain Monte Carlo.
n-ary Fuzzy Logic and Neutrosophic Logic Operators
Smarandache, Florentin, Christianto, V.
We extend Knuth's 16 Boolean binary logic operators to fuzzy logic and neutrosophic logic binary operators. Then we generalize them to n-ary fuzzy logic and neutrosophic logic operators using the smarandache codification of the Venn diagram and a defined vector neutrosophic law. In such way, new operators in neutrosophic logic/set/probability are built.