Goto

Collaborating Authors

 Uncertainty


Sequential Local Learning for Latent Graphical Models

arXiv.org Machine Learning

Sejun Park Eunho Y ang โ€  Jinwoo Shin November 4, 2017 Abstract Learning parameters of latent graphical models (GM) is inherently much harder than that of no-latent ones since the latent variables make the corresponding log-likelihood non-concave. Nevertheless, expectation-maximization schemes are popularly used in practice, but they are typically stuck in local optima. In the recent years, the method of moments have provided a refreshing angle for resolving the non-convex issue, but it is applicable to a quite limited class of latent GMs. In this paper, we aim for enhancing its power via enlarging such a class of latent GMs. To this end, we introduce two novel concepts, coined marginalization and conditioning, which can reduce the problem of learning a larger GM to that of a smaller one. More importantly, they lead to a sequential learning framework that repeatedly increases the learning portion of given latent GM, and thus covers a significantly broader and more complicated class of loopy latent GMs which include convolutional and random regular models. 1 Introduction Graphical models (GM) are succinct representation of a joint distribution on a graph where each node corresponds to a random variable and each edge represents the conditional independence between random variables. GM have been successfully applied for various fields including information theory [12, 19], physics [24] and machine learning [18, 11]. Introducing latent variables to GM has been popular approaches for enhancing their representation powers in recent deep models, e.g., convolutional/restricted/deep Boltzmann machines [20, 27]. Furthermore, they are inevitable in certain scenarios when a part of samples is missing, e.g., see [10]. However, learning parameters of latent GMs is significantly harder than that of no-latent ones since the latent variables make the corresponding negative log-likelihood non-convex.


A statistical model for aggregating judgments by incorporating peer predictions

arXiv.org Machine Learning

It is a truism that the knowledge of groups of people, particularly experts, outperforms that of individuals [43] and there is increasing call to use the dispersed judgments of the crowd in policy making [42]. There is a large literature spanning multiple disciplines on methods for aggregating beliefs (for reviews see [9, 6, 7]), and previous applications have included political and economic forecasting [3, 27], evaluating nuclear safety [10] and public policy [28], and assessing the quality of chemical probes [31]. However, previous approaches to aggregating beliefs have implicitly assumed'kind' (as opposed to'wicked') environments [16]. In a previous paper, [35] we proposed an algorithm for aggregating beliefs using not only respondent's answers but also their prediction of the answer distribution, and proved that for an infinite number of non-noisy Bayesian respondents, it would always determine the correct answer if sufficient evidence was available in the world. 1 Here, we build on this approach but treat the aggregation problem as one of statistical inference. We propose a model of how people formulate their own judgments and predict the distribution of the judgments of others, and use this model to infer the most probable world state giving rise to the observed data from people. The model can be applied at the level of a single question but also across multiple questions, to infer the domain expertise of respondents. The model is thus broader in scope than other machine learning models for aggregation in that it accepts unique questions, but can also be compared to their performance across multiple questions. We do not assume that the aggregation model has access to correct answers or to historical data about the performance of respondents on similar questions. By using a simple model of how people make such judgments, we are able to increase the accuracy of the group's aggregate answer in domains ranging from estimating art prices to diagnosing skin lesions.


An Empirical-Bayes Score for Discrete Bayesian Networks

arXiv.org Machine Learning

Bayesian network structure learning is often performed in a Bayesian setting, by evaluating candidate structures using their posterior probabilities for a given data set. Score-based algorithms then use those posterior probabilities as an objective function and return the maximum a posteriori network as the learned model. For discrete Bayesian networks, the canonical choice for a posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal likelihood with a uniform (U) graph prior (Heckerman et al., 1995). Its favourable theoretical properties descend from assuming a uniform prior both on the space of the network structures and on the space of the parameters of the network. In this paper, we revisit the limitations of these assumptions; and we introduce an alternative set of assumptions and the resulting score: the Bayesian Dirichlet sparse (BDs) empirical Bayes marginal likelihood with a marginal uniform (MU) graph prior. We evaluate its performance in an extensive simulation study, showing that MU+BDs is more accurate than U+BDeu both in learning the structure of the network and in predicting new observations, while not being computationally more complex to estimate.


Analysis of Perishable Products Sales Using Bayesian Inference

@machinelearnbot

It is very important to make sales forecasting in the supply chain management. In our previous post, we considered different approaches for time series forecasting. The most important thing is to make a decision how many products should be supplied into each store. If we can predict future sales precisely, the amount of products we need to supply is equal to our precise prediction. But in the real life we cannot make precise prediction, we rather can predict product consumption value with some confidential interval.


Segmentation of skin lesions based on fuzzy classification of pixels and histogram thresholding

arXiv.org Machine Learning

UTOMATED segmentation of skin lesions in dermoscopy images is currently a challenging problem [1]. This paper proposes an innovative method to address this problem developed by the authors. It has been structured as follows. Firstly, in this introduction, on the one hand the segmentation problem is described and, on the other, the evaluation criteria used (image database, ground truths and metrics) are shown. Secondly, the system design is presented. Thirdly, the results and the discussion are shown. A. Problems with segmentation of skin lesions in dermoscopy images Automated segmentation of a skin lesion is a complex issue, as the possible casuistry that can appear in the images is very diverse. The main problems that can de found in the image which make segmentation difficult are as follows: 1. Presence of hair; 2. Other artifacts such as electronic letters, rulers, ink and color charts, etc.; 3. Dark rectangular or circular marks around it (a consequence of shadow); 4. Flashes; 5. Lighting problems: apart from the problem with dark marks and flashes that have already been mentioned, in some cases one part of the image turns out to be darker than another (a common cases is that the part of the skin beside the circular marks is often darker as it is less brightly lit, and some images also turn out to be darker than others; 6. As a result of the oil used to acquire many images, there may be distortion problems and bubbles; 7. Presence of blood vessels; 8. Presence of regression areas and blue-whitish veil -in many cases these structures have greater intensity than the skin surrounding the lesion; 9. Hypopigmentation areas which are confused with skin; 10.


High SNR Consistent Compressive Sensing

arXiv.org Machine Learning

High signal to noise ratio (SNR) consistency of model selection criteria in linear regression models has attracted a lot of attention recently. However, most of the existing literature on high SNR consistency deals with model order selection. Further, the limited literature available on the high SNR consistency of subset selection procedures (SSPs) is applicable to linear regression with full rank measurement matrices only. Hence, the performance of SSPs used in underdetermined linear models (a.k.a compressive sensing (CS) algorithms) at high SNR is largely unknown. This paper fills this gap by deriving necessary and sufficient conditions for the high SNR consistency of popular CS algorithms like $l_0$-minimization, basis pursuit de-noising or LASSO, orthogonal matching pursuit and Dantzig selector. Necessary conditions analytically establish the high SNR inconsistency of CS algorithms when used with the tuning parameters discussed in literature. Novel tuning parameters with SNR adaptations are developed using the sufficient conditions and the choice of SNR adaptations are discussed analytically using convergence rate analysis. CS algorithms with the proposed tuning parameters are numerically shown to be high SNR consistent and outperform existing tuning parameters in the moderate to high SNR regime.


Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

arXiv.org Machine Learning

Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization trick is applicable when we can simulate a random variable by applying a differentiable deterministic function on an auxiliary random variable whose distribution is fixed. For many distributions of interest (such as the gamma or Dirichlet), simulation of random variables relies on acceptance-rejection sampling. The discontinuity introduced by the accept-reject step means that standard reparameterization tricks are not applicable. We propose a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm. Our approach enables reparameterization on a larger class of variational distributions. In several studies of real and synthetic data, we show that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods. This leads to faster convergence of stochastic gradient variational inference.


The best kept secret about linear and logistic regression

@machinelearnbot

All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or regression parameters). The non-statistical approach is also more robust than theory described in all statistics textbooks and taught in all statistical courses. It does not require Map-Reduce when data is really big, nor any matrix inversion, maximum likelihood estimation, or mathematical optimization (Newton algorithm). It is indeed incredibly simple, robust, easy to interpret, and easy to code (no statistical libraries required).


Identifying Best Interventions through Online Importance Sampling

arXiv.org Machine Learning

Motivated by applications in computational advertising and systems biology, we consider the problem of identifying the best out of several possible soft interventions at a source node $V$ in an acyclic causal directed graph, to maximize the expected value of a target node $Y$ (located downstream of $V$). Our setting imposes a fixed total budget for sampling under various interventions, along with cost constraints on different types of interventions. We pose this as a best arm identification bandit problem with $K$ arms where each arm is a soft intervention at $V,$ and leverage the information leakage among the arms to provide the first gap dependent error and simple regret bounds for this problem. Our results are a significant improvement over the traditional best arm identification results. We empirically show that our algorithms outperform the state of the art in the Flow Cytometry data-set, and also apply our algorithm for model interpretation of the Inception-v3 deep net that classifies images.


Joint Causal Inference from Observational and Experimental Datasets

arXiv.org Artificial Intelligence

We introduce Joint Causal Inference (JCI), a powerful formulation of causal discovery from multiple datasets that allows to jointly learn both the causal structure and targets of interventions from statistical independences in pooled data. Compared with existing constraint-based approaches for causal discovery from multiple data sets, JCI offers several advantages: it allows for several different types of interventions in a unified fashion, it can learn intervention targets, it systematically pools data across different datasets which improves the statistical power of independence tests, and most importantly, it improves on the accuracy and identifiability of the predicted causal relations. A technical complication that arises in JCI is the occurrence of faithfulness violations due to deterministic relations. We propose a simple but effective strategy for dealing with this type of faithfulness violations. We implement it in ACID, a determinism-tolerant extension of Ancestral Causal Inference (ACI) (Magliacane et al., 2016), a recently proposed logic-based causal discovery method that improves reliability of the output by exploiting redundant information in the data. We illustrate the benefits of JCI with ACID with an evaluation on a simulated dataset.