Bayesian Spectral Deconvolution Based on Poisson Distribution: Bayesian Measurement and Virtual Measurement Analytics (VMA)

arXiv.org Machine Learning

In this paper, we propose a new method of Bayesian measurement for spectral deconvolution, which regresses spectral data into the sum of unimodal basis function such as Gaussian or Lorentzian functions. Bayesian measurement is a framework for considering not only the target physical model but also the measurement model as a probabilistic model, and enables us to estimate the parameter of a physical model with its confidence interval through a Bayesian posterior distribution given a measurement data set. The measurement with Poisson noise is one of the most effective system to apply our proposed method. Since the measurement time is strongly related to the signal-to-noise ratio for the Poisson noise model, Bayesian measurement with Poisson noise model enables us to clarify the relationship between the measurement time and the limit of estimation. In this study, we establish the probabilistic model with Poisson noise for spectral deconvolution. Bayesian measurement enables us to perform virtual and computer simulation for a certain measurement through the established probabilistic model. This property is called "Virtual Measurement Analytics(VMA)" in this paper. We also show that the relationship between the measurement time and the limit of estimation can be extracted by using the proposed method in a simulation of synthetic data and real data for XPS measurement of MoS$_2$.


A Random Finite Set Model for Data Clustering

arXiv.org Machine Learning

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.


Bayesian model selection consistency and oracle inequality with intractable marginal likelihood

arXiv.org Machine Learning

In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Bayesian model selection procedure to be consistent and obey the Occam's razor phenomenon, i.e., the probability of selecting the "smallest" model that contains the truth tends to one as the sample size goes to infinity. In order to show that a Bayesian model selection procedure selects the smallest model containing the truth, we impose a prior anti-concentration condition, requiring the prior mass assigned by large models to a neighborhood of the truth to be sufficiently small. In a more general setting where the strong model identifiability assumption may not hold, we introduce the notion of local Bayesian complexity and develop oracle inequalities for Bayesian model selection procedures. Our Bayesian oracle inequality characterizes a trade-off between the approximation error and a Bayesian characterization of the local complexity of the model, illustrating the adaptive nature of averaging-based Bayesian procedures towards achieving an optimal rate of posterior convergence. Specific applications of the model selection theory are discussed in the context of high-dimensional nonparametric regression and density regression where the regression function or the conditional density is assumed to depend on a fixed subset of predictors. As a result of independent interest, we propose a general technique for obtaining upper bounds of certain small ball probability of stationary Gaussian processes.


Locally Private Bayesian Inference for Count Models

arXiv.org Machine Learning

As more aspects of social interaction are digitally recorded, there is a growing need to develop privacy-preserving data analysis methods. Social scientists will be more likely to adopt these methods if doing so entails minimal change to their current methodology. Toward that end, we present a general and modular method for privatizing Bayesian inference for Poisson factorization, a broad class of models that contains some of the most widely used models in the social sciences. Our method satisfies local differential privacy, which ensures that no single centralized server need ever store the non-privatized data. To formulate our local-privacy guarantees, we introduce and focus on limited-precision local privacy---the local privacy analog of limited-precision differential privacy (Flood et al., 2013). We present two case studies, one involving social networks and one involving text corpora, that test our method's ability to form the posterior distribution over latent variables under different levels of noise, and demonstrate our method's utility over a na\"{i}ve approach, wherein inference proceeds as usual, treating the privatized data as if it were not privatized.