Goto

Collaborating Authors

 Bayesian Learning


Compiling Stochastic Constraint Programs to And-Or Decision Diagrams

arXiv.org Artificial Intelligence

Factored stochastic constraint programming (FSCP) is a formalism to represent multi-stage decision making problems under uncertainty. FSCP models support factorized probabilistic models and involve constraints over decision and random variables. These models have many applications in real-world problems. However, solving these problems requires evaluating the best course of action for each possible outcome of the random variables and hence is computationally challenging. FSCP problems often involve repeated subproblems which ideally should be solved once. In this paper we show how identifying and exploiting these identical subproblems can simplify solving them and leads to a compact representation of the solution. We compile an And-Or search tree to a compact decision diagram. Preliminary experiments show that our proposed method significantly improves the search efficiency by reducing the size of the problem and outperforms the existing methods.


Model-Agnostic Linear Competitors -- When Interpretable Models Compete and Collaborate with Black-Box Models

arXiv.org Artificial Intelligence

Driven by an increasing need for model interpretability, interpretable models have become strong competitors for black-box models in many real applications. In this paper, we propose a novel type of model where interpretable models compete and collaborate with black-box models. We present the Model-Agnostic Linear Competitors (MALC) for partially interpretable classification. MALC is a hybrid model that uses linear models to locally substitute any black-box model, capturing subspaces that are most likely to be in a class while leaving the rest of the data to the black-box. MALC brings together the interpretable power of linear models and good predictive performance of a black-box model. We formulate the training of a MALC model as a convex optimization. The predictive accuracy and transparency (defined as the percentage of data captured by the linear models) balance through a carefully designed objective function and the optimization problem is solved with the accelerated proximal gradient method. Experiments show that MALC can effectively trade prediction accuracy for transparency and provide an efficient frontier that spans the entire spectrum of transparency.


Satisficing Mentalizing: Bayesian Models of Theory of Mind Reasoning in Scenarios with Different Uncertainties

arXiv.org Artificial Intelligence

The ability to interpret the mental state of another agent based on its behavior, also called Theory of Mind (ToM), is crucial for humans in any kind of social interaction. Artificial systems, such as intelligent assistants, would also greatly benefit from such mentalizing capabilities. However, humans and systems alike are bound by limitations in their available computational resources. This raises the need for satisficing mentalizing, reconciling accuracy and efficiency in mental state inference that is good enough for a given situation. In this paper, we present different Bayesian models of ToM reasoning and evaluate them based on actual human behavior data that were generated under different kinds of uncertainties. We propose a Switching approach that combines specialized models, embodying simplifying presumptions, in order to achieve a more statisficing mentalizing compared to a Full Bayesian ToM model.


Deep Multi-Facial patches Aggregation Network for Expression Classification from Face Images

arXiv.org Artificial Intelligence

Emotional Intelligence in Human-Computer Interaction has attracted increasing attention from researchers in multidisciplinary research fields including psychology, computer vision, neuroscience, artificial intelligence, and related disciplines. Human prone to naturally interact with computers face-to-face. Human Expressions is an important key to better link human and computers. Thus, designing interfaces able to understand human expressions and emotions can improve Human-Computer Interaction (HCI) for better communication. In this paper, we investigate HCI via a deep multi-facial patches aggregation network for Face Expression Recognition (FER). Deep features are extracted from facial parts and aggregated for expression classification. Several problems may affect the performance of the proposed framework like the small size of FER datasets and the high number of parameters to learn. For That, two data augmentation techniques are proposed for facial expression generation to expand the labeled training. The proposed framework is evaluated on the extended Cohn-Konade dataset (CK+) and promising results are achieved.


The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks

arXiv.org Machine Learning

Directed acyclic graphs, or DAGs, are commonly used to repre sent causal relationships in complex biological systems. For example, in gene regulatory ne tworks, directed edges represent regulatory interactions among genes, which are represente d as nodes of the graph. While causal effects in biological networks can be accurately inferred fro m perturbation experiments [33]-- including single or double gene knockouts [30, 42]--these ar e costly to run. Estimating DAGs from observational data is thus an important exploratory ta sk for generating causal hypotheses [10, 15], and designing more efficient experiments. Since the number of possible directed graphs grows super-ex ponentially in the number of nodes, estimation of DAGs is an NPhard problem [6]. Methods of estimating DAGs from observational data can be broadly categorized into three cl asses. The first class, score-based methods, search over the space of all possible graphs, and at tempt to maximize a goodness-of-fit score, generally using a greedy algorithm.



Differentially Private Regression and Classification with Sparse Gaussian Processes

arXiv.org Machine Learning

A continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published \emph{cloaking method}. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and GPs in a practical manner.


Can A User Anticipate What Her Followers Want?

arXiv.org Machine Learning

Whenever a social media user decides to share a story, she is typically pleased to receive likes, comments, shares, or, more generally, feedback from her followers. As a result, she may feel compelled to use the feedback she receives to (re-)estimate her followers' preferences and decides which stories to share next to receive more (positive) feedback. Under which conditions can she succeed? In this work, we first look into this problem from a theoretical perspective and then provide a set of practical algorithms to identify and characterize such behavior in social media. More specifically, we address the above problem from the viewpoint of sequential decision making and utility maximization. For a wide variety of utility functions, we first show that, to succeed, a user needs to actively trade off exploitation-- sharing stories which lead to more (positive) feedback--and exploration-- sharing stories to learn about her followers' preferences. However, exploration is not necessary if a user utilizes the feedback her followers provide to other users in addition to the feedback she receives. Then, we develop a utility estimation framework for observation data, which relies on statistical hypothesis testing to determine whether a user utilizes the feedback she receives from each of her followers to decide what to post next. Experiments on synthetic data illustrate our theoretical findings and show that our estimation framework is able to accurately recover users' underlying utility functions. Experiments on several real datasets gathered from Twitter and Reddit reveal that up to 82% (43%) of the Twitter (Reddit) users in our datasets do use the feedback they receive to decide what to post next.


Adversarial $\alpha$-divergence Minimization for Bayesian Approximate Inference

arXiv.org Machine Learning

Neural networks are popular models for regression. They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Although back-propagation has shown good performance in many applications, it cannot easily output an estimate of the uncertainty in the predictions made. Measuring this uncertainty in the predictions of machine learning models is a critical aspect with important applications. Uncertainty estimates can be obtained by following a Bayesian approach in which a posterior distribution of the model parameters is computed. The posterior distribution summarizes which parameter values are compatible with the data. Typically,this posterior distribution is intractable and has to be approximated. Several approaches have been considered for solving this problem. We propose here a general method for approximate Bayesian inference based on minimizing{\alpha}-divergences which allows for flexible approximate distributions. The method is evaluated in the context of Bayesian neural networks for regression on extensive experiments. The results show that it often gives better performance in terms of the test log-likelihood and sometimes in terms of the squared error.


Value of Information in Probabilistic Logic Programs

arXiv.org Artificial Intelligence

In medical decision making, we have to choose among several expensive diagnostic tests such that the certainty about a patient's health is maximized while remaining within the bounds of resources like time and money. The expected increase in certainty in the patient's condition due to performing a test is called the value of information (VoI) for that test. In general, VoI relates to acquiring additional information to improve decision-making based on probabilistic reasoning in an uncertain system. This paper presents a framework for acquiring information based on VoI in uncertain systems modeled as Probabilistic Logic Programs (PLPs). Optimal decision-making in uncertain systems modeled as PLPs have already been studied before. But, acquiring additional information to further improve the results of making the optimal decision has remained open in this context. We model decision-making in an uncertain system with a PLP and a set of top-level queries, with a set of utility measures over the distributions of these queries. The PLP is annotated with a set of atoms labeled as "observable"; in the medical diagnosis example, the observable atoms will be results of diagnostic tests. Each observable atom has an associated cost. This setting of optimally selecting observations based on VoI is more general than that considered by any prior work. Given a limited budget, optimally choosing observable atoms based on VoI is intractable in general. We give a greedy algorithm for constructing a "conditional plan" of observations: a schedule where the selection of what atom to observe next depends on earlier observations. We show that, preempting the algorithm anytime before completion provides a usable result, the result improves over time, and, in the absence of a well-defined budget, converges to the optimal solution.