Goto

Collaborating Authors

 monte carlo algorithm



Mixture models for data with unknown distributions

arXiv.org Machine Learning

We describe and analyze a broad class of mixture models for real-valued multivariate data in which the probability density of observations within each component of the model is represented as an arbitrary combination of basis functions. Fits to these models give us a way to cluster data with distributions of unknown form, including strongly non-Gaussian or multimodal distributions, and return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We describe two fitting methods, one using an expectation-maximization (EM) algorithm and the other a Bayesian non-parametric method using a collapsed Gibbs sampler. The former is numerically efficient, but gives only point estimates of the probability densities. The latter is more computationally demanding but returns a full Bayesian posterior and also an estimate of the number of components. We demonstrate our methods with a selection of illustrative applications and give code implementing both algorithms.


Regularization by denoising: Bayesian model and Langevin-within-split Gibbs sampling

arXiv.org Machine Learning

This paper introduces a Bayesian framework for image inversion by deriving a probabilistic counterpart to the regularization-by-denoising (RED) paradigm. It additionally implements a Monte Carlo algorithm specifically tailored for sampling from the resulting posterior distribution, based on an asymptotically exact data augmentation (AXDA). The proposed algorithm is an approximate instance of split Gibbs sampling (SGS) which embeds one Langevin Monte Carlo step. The proposed method is applied to common imaging tasks such as deblurring, inpainting and super-resolution, demonstrating its efficacy through extensive numerical experiments. These contributions advance Bayesian inference in imaging by leveraging data-driven regularization strategies within a probabilistic framework.


Solving the HP model with Nested Monte Carlo Search

arXiv.org Artificial Intelligence

In this paper we present a new Monte Carlo Search (MCS) algorithm for finding the ground state energy of proteins in the HP-model. We also compare it briefly to other MCS algorithms not usually used on the HP-model and provide an overview of the algorithms used on HP-model. The algorithm presented in this paper does not beat state of the art algorithms, see PERM (Hsu and Grassberger 2011), REMC (Thachuk, Shmygelska, and Hoos 2007) or WLRE (W\"ust and Landau 2012) for better results. Hsu, H.-P.; and Grassberger, P. 2011. A review of Monte Carlo simulations of polymers with PERM. Journal of Statistical Physics, 144 (3): 597 to 637. Thachuk, C.; Shmygelska, A.; and Hoos, H. H. 2007. A replica exchange Monte Carlo algorithm for protein folding in the HP model. BMC Bioinformatics, 8(1): 342. W\"ust, T.; and Landau, D. P. 2012. Optimized Wang-Landau sampling of lattice polymers: Ground state search and folding thermodynamics of HP model proteins. The Journal of Chemical Physics, 137(6): 064903.


Training neural networks using Metropolis Monte Carlo and an adaptive variant

arXiv.org Artificial Intelligence

We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.


SwISS: A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

arXiv.org Machine Learning

Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets.


A survey of Monte Carlo methods for noisy and costly densities with application to reinforcement learning

arXiv.org Machine Learning

This survey gives an overview of Monte Carlo methodologies using surrogate models, for dealing with densities which are intractable, costly, and/or noisy. This type of problem can be found in numerous real-world scenarios, including stochastic optimization and reinforcement learning, where each evaluation of a density function may incur some computationally-expensive or even physical (real-world activity) cost, likely to give different results each time. The surrogate model does not incur this cost, but there are important trade-offs and considerations involved in the choice and design of such methodologies. We classify the different methodologies into three main classes and describe specific instances of algorithms under a unified notation. A modular scheme which encompasses the considered methods is also presented. A range of application scenarios is discussed, with special attention to the likelihood-free setting and reinforcement learning. Several numerical comparisons are also provided.


Forecast Analysis of the COVID-19 Incidence in Lebanon: Prediction of Future Epidemiological Trends to Plan More Effective Control Programs

arXiv.org Artificial Intelligence

Ever since the COVID-19 pandemic started, all the governments have been trying to limit its effects on their citizens and countries. This pandemic was harsh on different levels for almost all populations worldwide and this is what drove researchers and scientists to get involved and work on several kinds of simulations to get a better insight into this virus and be able to stop it the earliest possible. In this study, we simulate the spread of COVID-19 in Lebanon using an Agent-Based Model where people are modeled as agents that have specific characteristics and behaviors determined from statistical distributions using Monte Carlo Algorithm. These agents can go into the world, interact with each other, and thus, infect each other. This is how the virus spreads. During the simulation, we can introduce different Non-Pharmaceutical Interventions - or more commonly NPIs - that aim to limit the spread of the virus (wearing a mask, closing locations, etc). Our Simulator was first validated on concepts (e.g. Flattening the Curve and Second Wave scenario), and then it was applied on the case of Lebanon. We studied the effect of opening schools and universities on the pandemic situation in the country since the Lebanese Ministry of Education is planning to do so progressively, starting from 21 April 2021. Based on the results we obtained, we conclude that it would be better to delay the school openings while the vaccination campaign is still slow in the country.


On Learning Graphs with Edge-Detecting Queries

arXiv.org Machine Learning

We consider the problem of learning a general graph $G=(V,E)$ using edge-detecting queries, where the number of vertices $|V|=n$ is given to the learner. The information theoretic lower bound gives $m\log n$ for the number of queries, where $m=|E|$ is the number of edges. In case the number of edges $m$ is also given to the learner, Angluin-Chen's Las Vegas algorithm \cite{AC08} runs in $4$ rounds and detects the edges in $O(m\log n)$ queries. In the other harder case where the number of edges $m$ is unknown, their algorithm runs in $5$ rounds and asks $O(m\log n+\sqrt{m}\log^2 n)$ queries. There have been two open problems: \emph{(i)} can the number of queries be reduced to $O(m\log n)$ in the second case, and, \emph{(ii)} can the number of rounds be reduced without substantially increasing the number of queries (in both cases). For the first open problem (when $m$ is unknown) we give two algorithms. The first is an $O(1)$-round Las Vegas algorithm that asks $m\log n+\sqrt{m}(\log^{[k]}n)\log n$ queries for any constant $k$ where $\log^{[k]}n=\log \stackrel{k}{\cdots} \log n$. The second is an $O(\log^*n)$-round Las Vegas algorithm that asks $O(m\log n)$ queries. This solves the first open problem for any practical $n$, for example, $n<2^{65536}$. We also show that no deterministic algorithm can solve this problem in a constant number of rounds. To solve the second problem we study the case when $m$ is known. We first show that any non-adaptive Monte Carlo algorithm (one-round) must ask at least $\Omega(m^2\log n)$ queries, and any two-round Las Vegas algorithm must ask at least $m^{4/3-o(1)}\log n$ queries on average. We then give two two-round Monte Carlo algorithms, the first asks $O(m^{4/3}\log n)$ queries for any $n$ and $m$, and the second asks $O(m\log n)$ queries when $n>2^m$. Finally, we give a $3$-round Monte Carlo algorithm that asks $O(m\log n)$ queries for any $n$ and $m$.


Query Complexity of Clustering with Side Information

Neural Information Processing Systems

Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, ``do two elements $u$ and $v$ belong to the same cluster?''. The goal is to recover the optimum clustering by asking the minimum number of queries. In this paper, we provide a rigorous theoretical study of this basic problem of query complexity of interactive clustering, and give strong information theoretic lower bounds, as well as nearly matching upper bounds. Most clustering problems come with a similarity matrix, which is used by an automated process to cluster similar points together. To improve accuracy of clustering, a fruitful approach in recent years has been to ask a domain expert or crowd to obtain labeled data interactively. Many heuristics have been proposed, and all of these use a similarity function to come up with a querying strategy. Even so, there is a lack systematic theoretical study. Our main contribution in this paper is to show the dramatic power of side information aka similarity matrix on reducing the query complexity of clustering. A similarity matrix represents noisy pair-wise relationships such as one computed by some function on attributes of the elements. A natural noisy model is where similarity values are drawn independently from some arbitrary probability distribution $f_+$ when the underlying pair of elements belong to the same cluster, and from some $f_-$ otherwise. We show that given such a similarity matrix, the query complexity reduces drastically from $\Theta(nk)$ (no similarity matrix) to $O(\frac{k^2\log{n}}{\cH^2(f_+\|f_-)})$ where $\cH^2$ denotes the squared Hellinger divergence. Moreover, this is also information-theoretic optimal within an $O(\log{n})$ factor. Our algorithms are all efficient, and parameter free, i.e., they work without any knowledge of $k, f_+$ and $f_-$, and only depend logarithmically with $n$.