Goto

Collaborating Authors

 instrumental distribution




Reviews: Pseudo-Extended Markov chain Monte Carlo

Neural Information Processing Systems

Update: I have read the author response and am satisfied with the commitment to elaborate on \beta and \pi and to simplify the Stan PE code with a "pseudo-extended" function. This paper presents a new MCMC sampling method called pseudo-extended MCMC that uses an instrumental distribution to projects the data into a higher-dimensional space where the modes are connected, making it easier for the sampler to mix. A default instrumental distribution based on tempering is provided. The method is compared to existing baselines showing efficacy on three benchmark datasets. The paper is well-placed within the existing literature.


Binary classification based Monte Carlo simulation

Argouarc'h, Elouan, Desbouvries, François

arXiv.org Machine Learning

Acceptance-rejection (AR), Independent Metropolis Hastings (IMH) or importance sampling (IS) Monte Carlo (MC) simulation algorithms all involve computing ratios of probability density functions (pdfs). On the other hand, classifiers discriminate labeled samples produced by a mixture of two distributions and can be used for approximating the ratio of the two corresponding pdfs.This bridge between simulation and classification enables us to propose pdf-free versions of pdf-ratio-based simulation algorithms, where the ratio is replaced by a surrogate function computed via a classifier. From a probabilistic modeling perspective, our procedure involves a structured energy based model which can easily be trained and is compatible with the classical samplers.


In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling

Marchant, Neil G., Rubinstein, Benjamin I. P.

arXiv.org Machine Learning

Entity resolution (ER) presents unique challenges for evaluation methodology. While crowdsourcing platforms acquire ground truth, sound approaches to sampling must drive labelling efforts. In ER, extreme class imbalance between matching and non-matching records can lead to enormous labelling requirements when seeking statistically consistent estimates for rigorous evaluation. This paper addresses this important challenge with the OASIS algorithm: a sampler and F-measure estimator for ER evaluation. OASIS draws samples from a (biased) instrumental distribution, chosen to ensure estimators with optimal asymptotic variance. As new labels are collected OASIS updates this instrumental distribution via a Bayesian latent variable model of the annotator oracle, to quickly focus on unlabelled items providing more information. We prove that resulting estimates of F-measure, precision, recall converge to the true population values. Thorough comparisons of sampling methods on a variety of ER datasets demonstrate significant labelling reductions of up to 83% without loss to estimate accuracy.